Temporal Dynamics of Trend in Relative Humidity with RH-SARIMA Model

The relative humidity (RH) of 13 stations all over the peninsular Malaysia for the period of 1968 to 2009 is examined in this study. In understanding the trend flow, the Mann–Kendall (MK) trend test of RH of selected 13 stations all over Malaysia reported a decreasing trend over all parts excluding one station. RH prediction is an important problem in the climate change study; it determines future trend based on past values. The main goal of this paper is to create a model and make future trend predictions using RH data. Among the most effective and prominent approaches for analysing time series data is the methods introduced by Box and Jenkins. In this study we applied the Box-Jenkins methodology to build an RH-Seasonal Autoregressive Integrated Moving Average model (SARIMA) for monthly RH data. The RH-SARIMA model for each station was developed. These models were used to forecast 30 months upcoming RH data. The result will help decision makers to establish priorities in terms of climate change impact over peninsular Malaysia.


Introduction
Humidity plays an important role in our daily life.Humidity affects human comfort and the perceived temperature by humans is largely dependent upon atmospheric moisture content.In the crop production context, relative humidity (RH) directly influences the water relations of plant and indirectly affects leaf growth, photosynthesis, pollination, occurrence of diseases and economic yield.Humidity is a very important environmental element that must be controlled for healthy plants and connoisseur grade needs.
Excessive or insufficient humidity may lead to several problems.High humidity is risky to human health and can cause dangerous effect such as heat cramps, heat syncope or fainting, heat exhaustion and heat stroke.High humidity level also causes problems for plants, since plants need to lose water vapor from leaf surfaces during photosynthesis and transpiration.Therefore, maintaining correct levels becomes even more important.The recommended RH level is between 35% and 45%.This range will provide the best comfort for well-being.Air humidity conditions, among other factors, may affect for example human comfort, weather-related mortality (e.g.Kalkstein, 1991;Saez et al., 2000) and air pollution induced respiratory diseases (e.g.Makra et al., 2008).However, the climate-modifying effect of urbanization is obvious for air humidity (Potchter et al., 2003).On the contrary, air humidity, especially relative humidity (RH), are still receives less investigation in urban climate research, compared to other parameters such as temperature, precipitation and wind speed.
Over the years there has been an increasing concern on whether there is an increasing or decreasing trend in relative humidity as a result of climate change.Trend extraction is one of the major tasks of time series analysis.The trend of a time series is considered as a smooth additive component that contains information about global change.Studies have been made in different parts of the world.For example, Gaffen and Rebecca (1999) shows that trends of US surface humidity and temperature shows that the specific humidity increases of several percent per decade and it is consistent with upward temperature trends.Paltridge et al. (2008) discover that at and above 850 hPa, Relative Humidity has decreased over the last three or four decades as the surface and atmospheric temperatures have increased for the entire globe at each of the standard pressure heights from 1000 to 300 hPa.Pierce et al. (2013) found that the interior western US experienced a decline trend of relative humidity of about 0.1 to 0.6 percent per decade.
Several studies indicate that the most widely used method for detecting trend is the nonparametric Mann-Kendall (MK) trend test.Mann in 1945 originally derived the test and Kendall in 1975 subsequently derived the test statistic commonly known as the Kendall's tau statistic.It was found to be an excellent tool for trend detection in different applications (Lettenmaier et al., 1994;Burn and Hag-Elnur, 2002).In statistical field, the Mann Kendall test was used in most of the hydrological data due to two advantages; the data need not to conform to any particular distribution and its low sensitivity to abrupt breaks due to inhomogeneous time series (Tabari et al., 2011).
Time series analysis methods determine future trend based on past values.Since a time series method only required the historical data, it is widely used to develop predictive models.Predictions essentially provide future values of the time series on a specific variable.Time series predictions methods are based on analysis of historical data with the assumption that past patterns in data can be used to predict future data points (Fadhilah and Ibrahim, 2012).Among studies that predicted RH data, Sarraf et al. ( 2011) developed a forecasting model for monthly relative humidity data in Ahwaz Station, Iran based on Box-Jenkins algorithm and it was implemented in agricultural year of 2010-2011.Mustafaraj et al. (2011) developed a prediction model known as non-linear neural network NNRAX to predict room temperature and relative humidity for an open office.Liu and Meehan (2013) investigated the effect of relative humidity on squeal and friction creep curves and found that the lateral adhesion ratio decreases slightly with the increase of relative humidity.In the study of variability and forecasting of relative humidity in Bangladesh, Syeda (2012) reported that the climate of Bangladesh has been changing in terms of average, minimum, maximum and range relative humidity.
However, in Peninsular Malaysia little or not has being done to check the dynamics Relative Humidity.This paper explored, modelled and predict trend in the Relative Humidity (RH) datasets across Peninsular Malaysia.This will help authorities to know the dynamics of humidity pattern in Malaysia.

Data and Study Area
Malaysia with an average Relative Humidity of 70% to 90% per year and monthly average of 3% to 15% in any region of the country causes it to be hot and humid weather.Malaysia is situated one to six degree North latitude; Malaysia has an equatorial climate with uniformly high temperatures, high humidity, relatively light winds and abundant rainfall throughout the year.This paper focused on trend detection based on non-parametric Mann-Kendall (MK) trend test and the time-series model based on Box-Jenkins methodology to predict trend in the relative humidity data of some locations in the Peninsular Malaysia.

Mann Kendall Trend Test
Trend analysis is a method to spot a pattern or trend in a set of data.Mann Kendall test is a statistical test widely used for the analysis of trend in climatologic and in hydrologic time series (Neha, 2012).Mann Kendall trend test is dependent on the distribution type and the power of the test is also dependent on the shape parameter of the probability distribution as its increase with the coefficient of skewness (Onoz and Bayazit, 2003).Regarding the significance of the Mann-Kendall test results, J. Danneberg (2012) claimed that even though autocorrelation can influence results in terms of significance, but it has very little effect and can be disregarded.
The Mann-Kendall S Statistic is computed as follows: where T j and Ti are the annual values in years j and i, j > i, respectively.If n < 10, the value of |S| is compared directly to the theoretical distribution of S derived by Mann and Kendall (year).The two tailed test is used.At certain probability level H 0 rejected in favor of H 1 if the absolute value of S equals or exceeds a specified value S a/2 where S a/2 is the smallest S which has the probability less than a/2 to appear in case of no trend.A positive (negative) value of S indicates an upward (downward) trend.For n ≥ 10, the statistic S is approximately normally distributed with the mean and variance as follows: ( ) = 0 The variance (σ 2 ) for the S-statistic is defined by: in which t i denotes the number of ties to extent i.The summation term in the numerator is used only if the data series contains tied values.The standard test statistic Z s is calculated as follows: The test statistic Zs is used a measure of significance of trend.In fact, this test statistic is used to test the null hypothesis, H 0 .If | Zs | of trend Z α/2 where α represents the chosen significance level (eg: 5% with Z 0.025 = 1.96) then the null hypothesis is invalid implying that the trend is significant.The null hypothesis is tested at 95% confidence level for Relative Humidity data for the thirteen stations.

RH Modelling and Forecasting
Forecasting time series data is important component of operations research because these data often provide the foundation for decision models.Time series analysis provides tools for selecting a model that can be used to forecast of future events.Modeling the time series is a statistical problem.Forecasts are used in computational procedures to estimate the parameters of a model being used to allocate limited resources or to describe random processes such as those mentioned above.Time series models assume that observations vary according to some probability distribution about an underlying function of time.

Box-Jenkin Methodology
In time series analysis, the Box-Jenkins method, named after the statisticians George Box and Gwilym Jenkins, applies autoregressive moving average ARMA or ARIMA models to find the best fit of a time-series model to past values of a time series.The model must be stationary with constant mean and constant variance which is necessary in Box-Jenkin method (Mahipan et al., 2013).
The Box-Jenkins models uses an iterative three-stage modeling approach: 1. Model identification and model selection: making sure that the variables are stationary, identifying seasonality, and using plots of the autocorrelation and partial autocorrelation functions to decide which component should be used in the model.
2. Parameter estimation using computation algorithms to arrive at coefficients that best fit the selected ARIMA model.
3. Model checking by testing whether the estimated model conforms to the specifications of a stationary univariate process.

SARIMA Model
A time series is said to be seasonal if there exists a tendency for the series to exhibit a periodic behavior after certain time interval (Fadhilah and Ibrahim, 2012).The usual ARIMA models cannot really cope with seasonal behavior; it only model time series with trends.Seasonal ARIMA models are formed by including an additional seasonal terms in the ARIMA models and are defined by seven parameters.So, this is hopefully will be able to capture the behavior along the seasonal part of the series and therefore mislead to a wrong order selection for non-seasonal component (Sanna, Abdou and Leo, 2014).The seasonal ARIMA denoted by ARIMA (p, d, q)(P, D, Q)s is given as: Where N is sample size, L is the number of autocorrelation lags included in the statistic, and ̂ is the squared sample autocorrelation at lag j.Under the null hypothesis of no serial correlation, the Q test statistic is asymptotically chi-square distributed.The p-value above 0.05 indicates the acceptance of the null hypothesis of model adequacy at significance level 0.05 (Ibrahim & Fadhilah, 2013).

Results and Discussion
The time series data was analyzed in this section correspond to the monthly observations of the long term relative humidity datasets  for 13 stations across the peninsular Malaysia.All the time plots of the 13 stations are shown in the Figure 1.Table 1 shows the descriptive statistics of the average relative humidity data for peninsular Malaysia which is approximated to be considered as high and its make the weather hot and wet throughout the year.Most of the stations show negative skewness an indication of steady condition.

Trend Analysis
The Mann Kendall statistical trend test, tests the statistical hypothesis that there is an upward or downward trend with a specified probability.A positive S score indicates the possibility of an upward trend and negative S score represents the possibility of downward trend (Salmi et al. 2002;Luo et al. 2008).The presence of a statistically significant trend is evaluated using the (tau) τ value.A positive (negative) value of τ indicates an upward (downward) trend (Drapela, 2011).The statistical tests determine the probability value (p-value) of the Mann-Kendall statistic and the slope of the trend line, the smaller the p-value, the greater the weight of evidence against H o .Based on Mann Kendall test, for a long term Humidity datasets, 12 stations shows decreasing trend while only one station shows increasing as depicted in the table 2.

RH Time Series Model Building
The RH Time Series Models based on Box-Jenkins methodology are only applicable to stationary time series.The identification of an appropriate box-Jenkins model for a particular RH data would first require a check for stationarity.Each RH dataset was examined to check for the most appropriate class of Box-Jenkins models through selecting the order of consecutive non seasonal and seasonal differencing required making the series stationary.In general, if the ACF of the RH time series value either cuts off or dies down fairly quickly, then the time series values should be considered stationary.On the other hand, the time series values may be considered non-stationary if the ACF plot should show spikes above the 95% Confidence Interval.The ACF plots in Figure 2 and 3 shows strong periodic pattern at different seasonal lags.Clearly, the data shows seasonal behaviour which requires seasonal differencing.After stabilizing the data, a parsimonious RH seasonal ARIMA model was developed for each RH data.The models and the models parameters are summarized in Table 3.  ARIMA(2,1,1) (1,1,1

Diagnostic Results for the Fitted RH-SARIMA Models
After choosing the adequacy model, the accuracy of the model will be determined by looking at 3 diagnostic methods which are; standardized residuals, ACF of residuals and p-values for Ljung-Box (LB) statistics.
There are 3 main conditions to achieve an accurate model (Fadhilah and Ibrahim, 2012): (a)The standardized residuals must be stationary, the variance near to zero.It fluctuates up and down evenly with respect to horizontal axis.
(b)There are no spikes in ACF of residuals.
(c)The LB p values must be above 0.05.
A good model is the one fulfilling the 3 main conditions above.All the models passed the diagnostic checking as shown in Figure 4.

Forecasting with RH-SARIMA Model
After checking the model adequacy, its ability to forecast the RH time series data is tested.The past observation was used to predict the behavior of the onwards data for the period of 30 months.This further testifies the validity of the model.To check the accuracy of forecasting, the forecasting value was compared with the actual value for the period of 24 months onwards.The RMSE value was 3.44 and this can be said that the model developed was adequate with the percentage forecasting accuracy of 97.14%.However SARIMA methodology has certain limitations because it requires large number of observations for model identification and estimation, differencing the series may reduce the available information set, but it can be said parsimonious with respect to the coefficients and good in providing unconditional forecasts (Sanna, Abdou and Leo, 2014).Based on the fitted models, Figure 5 gives the forecast from all the models and the forecast follows the recent trend on the data which is considered as good forecast.

Conclusion
In this research, the dynamic of the RH data for 13 stations across the peninsular Malaysia has been studied using Mann Kendall test and Seasonal ARIMA model.The study shows that the relative humidity data is influenced by seasonal behaviour.The relative humidity data was modelled by using appropriate Box-Jenkins approach.The correct model of the relative humidity data was built.The model was able to forecast the future observation and shows the decreasing trend for most of the stations.This may be attributed to the unpredictable weather changes nowadays like prolong dry seasons, lack of rain, El nino phenomena, because when the relative humidity decreases, the temperature increases.Hence, the ability of the air to hold the water is less.In order to improve the ability of the forecasting, future work will concentrate on the residuals of the model to see whether to incorporate heteroscedastic model in order to deal with risks of volatility persistence.With this piece of information it is hopeful it can help the decision makers establish strategies for proper planning of agriculture, industries, building planning and health quality as well.

Figure 2 .
Figure 2. Autocorrelation functions of RH data Figure 3. Autocorrelation functions of RH first difference data

Figure 4 .
Figure 4. Diagnostic checking Figure 5. Trend forecast with fitted RH-SARIMA models Box test which is commonly used in ARIMA modelling.It is applied to the residuals of a fitted ARIMA model, not the original series, and in such applications, the hypothesis actually being tested is that the residuals from the ARIMA model have no autocorrelation, or it performs a lack-of-fit hypothesis test for model misspecification, which is based on the Q statistic given as: is the autoregressive part of order p i.e ( ( )) (1 − − −. . .− ) is the seasonal autoregressive part of order P i.e (

Table 1 .
Summary of descriptive statistics

Table 2 .
Trend Analysis results for all stations

Table 3 .
Summary of RH-SARIMA Models