Seasonal Modelling of Fourier Series with Linear Trend

This work was motivated by the need to model a periodic time series function with linear trend. A Fourier series representation with detrended linear function was proposed. In this representation, the time series Xt is expressed as a combination of the linear trend component and a linear combination of s orthogonal trigonometric functions; where s is the number of seasons. The method was applied to a rainfall data and the proposed model was found to give a good fit. Comparative study was carried out with the complete Fourier representation. Diagnostic checks revealed that the proposed method performs better the pure Fourier approach.


Introduction
The usual autoregressive integrated moving average (ARIMA) models developed by Box and Jenkins (1970) has been extensively used in modelling linear time series.The ARIMA models assume that the current observation depends on weighted previous observations, weighted previous random shocks and the current shock.However, most time series arising in nature do not assume linearity but rather, periodic or seasonal with linear trend.Seasonal time series contain a seasonal phenomenon that repeats itself after a regular period of time.Such phenomena stem from factors such as weather, which affects many business and economic activities, cultural events and graduation ceremonies.Series with seasonal pattern cannot be adequately represented by ARIMA models.To analyze such series, Wold (1974) arranged the series in a two dimensional table according to the season; and the totals and averages were computed.In Wold (1974) representation, a time series is thought to consist of trend-cycle, seasonal and irregular components.To estimate these components, several decompositions are usually involved.Box, Jenkins and Reinsel (2008) made an extension of the Box and Jenkins (1970) ARIMA models to include the seasonal part and is called the seasonal autoregressive integrated moving average (SARIMA) models.Despite these efforts, the models do not adequately represent most periodic series.
A better procedure extensively used for modelling periodic time series is the Fourier analysis.This method represent the time series by a set of elementary functions called basis such that all functions under study can be written as linear combinations of the elementary functions in the basis.These elementary functions involve the sine and cosine functions or complex exponentials.The Fourier series approach describes the fluctuation of time series in terms of sinusoidal behaviour at various frequencies.Despite the wider acceptability of the method, however, Fourier approach still suffers some set backs.One major problem associated with it is the cumbersomeness in Fourier representation and non inclusion of trend component.As will be seen in the methodology, the inconveniences in representing the time series is enormous if we are to include all the terms required in Fourier series.This cannot go well with a series of large sample size because representing all the terms will consume several pages and can be boring to both the researcher and the reader.Hence, there is need to shorten the number of terms in the Fourier expression and give a summarized representation that adequately describe the time series.This is the intent of this work.As earlier stated, seasonal variations in time series can be caused by climatic factor and we are going to use rainfall data in our illustration.
ARIMA component, the combined model was found to be adequate and forecast were generated.Cenis (1989) studied temperature in solarized soil using Fourier analysis.He obtained the daily maximum and minimum temperatures at two dept on daily basis for three summer periods.He used the values to fit sinusoidal equations which accounted for 93% variation.The variation and the hourly mean differences between the measured temperatures were calculated.The analysis gave an overall encouraging result.Serangelo, Ferrari and De Luca (2011) applied non-homogeneous Poisson process to examine the seasonal effects of daily rainfall.The modelling process involved the partitioning of observed daily rainfall data into calibration periods for the estimation of parameters.Though the validation period for checking the occurrence process changed; the model which was applied to the set of rain gauges placed at different geographical areas was shown to provide good fit.Falahah and Suorapto (2010) carried out research on rainfall data using analytic factor method.The data was obtained from 50 weather stations for a period of 30 years.The result was plotted on pattern factors to reveal dominant factor for each region and inspection period.The method explained factors that influence rainfall in Indonesia and the reasons for having relatively high humidity in one area than the other.Necholas, Mahmood and Hazan (2013) modelled rainfall data amounts for agriculture planning using gamma distribution models.Daily rainfall data of two stations having two different mean annual rainfalls were analyzed.Generalized linear models were used to fit smooth regression curves.The mean amount of rain per rainy day was computed using the estimates of parameters of the model for each day of the season.The adequacy of the fitted model was check by the analysis of deviance residuals and was found to be satisfactory.Fourier approach was employed for comparative study.It was discovered that though reasonable results were obtained, Fourier analysis was time consuming and boring.However, Fourier series was found suitable in fitting gamma distribution for the determination of mean rain per rainy day.Zakaria (2013) conducted a study on periodic and stochastic modelling of monthly rainfall and the periodicities were determined.Stochastic components were estimated using the auto-regressive model approach.Residuals obtained from the model were shown to follow a white noise process; thus indicating the adequacy of the fitted model.Beatrice, Nasser, Afshar, Selaman and Fahmi (2014) analyzed data from eight rain gauge stations.Annual rainfall data for 27 years were computed with the Fourier series equation.The result was compared with that obtained from harmonic series models.It was discovered that both models were capable of describing rainfall pattern and were able to provide reasonable relationship between the simulated and the observed data.
Akpanta, Okorie and Okoye (2015) adopted SARIMA modelling of the frequency approach in analyzing monthly rainfall data in Umuahia.Probability time series approach was considered.The original data plotted showed seasonality which was removed by differencing.After subjecting the model to diagnostic checks, SARIMA (0,0,0)(0,1,1) 12 was found to fit the data well and was used for prediction.

Methodology
In this method, a periodic time series is first observed whether it contains a linear trend or not.Visual inspection of the raw data plot can reveal this pattern.Assuming a linear trend is detected, a linear regression model of the form is first fitted to the data; where   is the observed time series,  is the time points ( = 1,2, … , ),  is the number of observations,  0 and  1 are the regression parameters, and   is the error component.
Fitting the above model (1) to the data   , we can obtain the estimate of the error component which can be tested for randomness or white noise.
After obtaining the trend equation ( i.e.  ̂=  0 ̂+  1 ̂ ), the main series   is detrended by the expression The resulting series   is then used to fit seasonal model using Fourier representation.

Fourier Series Representation of the Time Series 𝒚 𝒕
Given a time series of  observations, the Fourier representation is the set of  orthogonal trigonometric functions shown below: (3) ,   ~(0,  2 ); period =   =   ⁄ and   =   ⁄ is the  ℎ harmonic of the fundamental frequency 1  ⁄ .

The Peridogram
The periodogram is defined as the function of intensities (  ) at frequency   = / and is given as Periodogram is the plot of the intensities against the frequencies or periods.The periodogram (  ) is simply the sum of squares associated with the pair of coefficients (  ,   ) and hence with the frequency   or period   .That is, /2 =1 .In the context at hand, the periodogram is used to determine the seasonality or periodicity of a time series.This is usually indicated by the largest peak in the periodogram plot.

The Spectrum
The sample spectrum is obtained by allowing the frequency  to vary continuously in the range 0 to 0.5 cycle so that the periodogram can be re-defined as The function () is called the spectrum.

Autocorrelation Function
This is the plot of autocorrelation at lag k (  ) versus k.

Spectral Density Function
Spectral density is the Fourier transform of the auto-correlation function and is estimated by (2)-; 0 ≤  ≤ 0.5 where   is the autocorrelation at lag k.The spectral density performs the same function as the periodogram.The period or seasonality of a time series is obtained at where the spectral density is maximum.

White Noise Process
A process *  + is said to be a white noise process with mean 0 and variance   2 written *  +~(0,   2 ), if it is a sequence of uncorrelated random variables from a fixed distribution.

The Seasonal Fourier Representation
Rather than fitting the entire Fourier series expression in equation ( 3), we fit only the Fourier terms up to the season detected by the periodogram.That is, suppose the season determined by the periodogram in the detrended series   is , then equation (3) reduces to and   ̂=   −   ̂ (7) Comparatively, the expression (5) is less cumbersome in carrying out analysis than the complete Fourier form expressed in equation (3).The model ( 5) can be fitted to any periodic or seasonal data and the estimated residuals   ̂ obtained from (7) can be tested for white noise to determine whether the model is adequate or not.

Data Analysis and Result
The data used for this work is the average monthly rainfall data (  ) in Calabar, Nigeria between 2005-2015 (Source: www.cbn.gov.ng); and the analysis is carried out using Minitab and gretl softeware.

Complete Fourier Series Model
Fitting the full Fourier series in equation ( 3) where  = 120 2 = 60 result in a residual variance of 11.23 and the Fourier coefficients are displayed in Appendix C. The residual autocorrelation function is displayed in figure 5. Clearly, there is a significant spike at lag 12 (   = −0.36).This shows that the residuals are correlated (at lag 12) and hence do not follow a white noise process.The actual and estimate values plots displayed in figure 6 shows a low correlation between these values.Thus, the full Fourier series, despite it cumbersome nature does not fit adequately to the data.

Seasonality and the Estimated Trend
The raw data plot in figure 1 clearly shows the existence of seasonality and trend.This is indicated by the periodic pattern and upward movement of the graph.Fitting the trend equation gives the Minitab output in table 1 below.The new series   now becomes our working data.

The Peridogram and Spectral Density of 𝒚 𝒕
The periodogram analysis was conducted using the gretl software.The periodogram is displayed in figure 2. The values for the spectral densities, periods and frequencies are equally displayed in appendix A. From the table of appendix A and figure 2; it is observed that the periodogram is dominated by a very large peak at scaled frequency,   * =  = 10 (  = / = 10/120 = 0.0833).This is indicated by the largest spectral density of 16.397 shown in bold figures.This density and frequency correspond to a period or season,  = / = 120/10 = 12 months.This indicates a 12month cycle.+   (9) where,   = 2  .
Subjecting equation ( 9) to regression analysis give the parameter estimates displayed in Appendix B. In appendix B, some coefficients of the variables are not statistically significant (i.e.their  values are less than 0.05) and are therefore excluded in the overall equation.
Hence, from (6), the resulting estimated model is The residual   ̂ of the fitted is obtained from   ̂=   −   ̂

Diagnosis
We present here two diagnostic checks to ensure that the proposed model ( 10) fitted to the data is adequate.

Actual and Estimate Plots
The overlaid plots of the actual values (  ) and the estimated values (  ̂) is displayed in figure 3. The two superimposed plots move together in the same direction indicating closeness and strong correlation between the values of the two variables.This shows that the model is adequate.

Residual Variance, Autocorrelation and White Noise
The residual variance of the fitted model is 4.78.This is significantly smaller than the 11.23 obtained from fitting the full Fourier series.The residual autocorrelation function is displayed in figure 4 and it shows that there is no significant autocorrelation of the residuals.That is, all autocorrelations at all lags are within the range ±2/√ (as indicated by the two red lines in figure 4).This means the residuals of the fitted model are not serially correlated.In more precise terms, the residuals follow a white noise process.Hence, the model is adequate.

Discussion and Conclusion
It has been noted that the Fourier series model can only be applied to periodic series that are stationary in mean.If the series contain trend, however, special technique is required for the modelling process.Perhaps, this constituted the problem of Afshar et al (2014) that made them to obtain a discouraging result by applying ARIMA model to a periodic data without considering the trend.Besides, as noted by Necholas et al (2013), Fourier series modelling is time consuming and boring because of the large number of Fourier coefficients involved.In this work, however, these problems have been addressed.As clearly demonstrated in this method, the trend component of a periodic series is taken care of.Also, setting  =  has reduced the computational burden by 80% and has given adequate Fourier representation as confirmed by the diagnostic checks.It is believed that this work has opened another possibility of addressing periodic functions.

Figure 1 .
Figure 1.Raw data plot of the series

Figure 3 .
Figure 3. Actual and estimate plot of

Table 1 .
Minitab Output for the Trend equationThe p-values in table1shows that both the constant term and the coefficient of  are significant.Thus, the estimated