Improvement of Regression Forecasting Models

In this paper authors propose the technique, which decreases average forecast error of regression based models. The main idea of the method is to use the weighted sum of several regression equations, which satisfy Ordinary Least Squares prerequisites and have independent residuals, instead of only one. It is shown that if all method requirements are met, it is possible to decrease Mean Squared Error almost by half, using just three equations. This technique allows deriving equations which contain more predictors than the number of observations. Additionally, this method proves to be more consistent in time than any of regressions, used in it, separately. It is also illustrated, that the proposed method outperforms the regression equation, computed with the same independent variables, and, thus, it gives more accurate estimators of regression coefficients. Empirical results are provided as well.


Introduction
When modeling a social or economic process a researcher nearly always encounters an uncertainty whether created model will work in the future with the same effectiveness, what is partly highlighted by James Stock and Mark Watson (2007), Alhamzawi, R and Yu, K. (2012), Clark, T.E., McCracken, M.W. (2009), Gneiting, T. (2011) and Giordani, P., Kohn, R., van Dijk, D. (2007).In other words, even if a linear model fully satisfies all the OLS (Ordinary Least Squares) prerequisites and has some forecast error magnitude, one cannot assert that while using it for prediction this very magnitude stays at least within some acceptable area.Sometimes a forecast error may be several times as big as the original one and in this case such model will not be of much use.That happens due to the following reasons.Either unaccounted factors changed their values so, that coefficients estimators become biased, or accounted factors change the extent of their impact on the output variable.Actually, it can also be a combination of both, see for example Orphanides, A. and S. van Norden (2005) and Primiceri, G. (2005).In order to decrease those errors a researcher can work out a model which takes into account structural breaks and coefficients variability in regression equation, refer for example to Jan J.J. Groen, Richard Paap and Francesco Ravazzolo (2009) and Sensier, M., Van Dijk, D. (2004), Clark, T.E. (2011), Jore, A.S., J. Mitchell and S. P. Vahey (2010) and Koop, G. and S. Potter (2007).However, the following problem, to authors' personal opinion, is still not fully solved.While specifying a regression equation and trying to adjust it to satisfy all the OLS prerequisites, one may skip a lot of data which actually impacts the model output.That is why we devote this paper to development of the method that could grasp more explanatory variables and thus significantly decrease forecast error without violating the classical way of regression model specification.

Empirical Background of the Method
Let us say we are creating an inflation forecasting model for the USA economy.Quarterly CPI (Consumer Price Index) is selected as the output variable .As possible independent variables we test three lags of the dependent variable and of each quarterly index for the following macroeconomic indicators: GDP (Gross Domestic Product), GDI (Gross Domestic Income), Federal Funds Rate, Non-Farm Payrolls, Brent Oil Price, Dow Jones Industrial Average, RGDP (Real Gross Domestic Product), Monetary Base M2, Total Export, Total Import, Money Velocity (calculated as a ratio of GDP and M2), Employment Rate.Thus, the regression equation can be written as follows: (1) where is the number of explanatory variables, and stand for coefficients for the lagged CPI and lagged explanatory variables respectively, -constant term.
Here we don't take into account zero lags, because all the data mentioned above are not available at the very beginning of the next time period and are issued only during the quarter.Therefore one cannot make a forecast before another quarter starts.For building the model we take 95 observations, beginning from the 1 st quarter of 1960.After the first equation is set up, the data frame is shifted forward by one observation and again new equation is computed.This procedure goes on for 70 times.To specify the regression equation the following optimization problem with restrictions is used: (2) where -Mean Squared Error, -significance level for k-th predictor, ∈ 1. ., -number of selected predictors, -Variance Inflation Factor for k-th predictor, -Condition Number.
Here it is actually not so important which variable selection method we use, whether it is Bayesian variable selection, Gibbs sampling or classical approach, refer to De Mol C., D. Giannone and L. Reichlin (2008) and Wright, J.H. (2009).Let us say we just choose the best fitting regression shape according to some algorithm.To illustrate the regression structure inconsistency we provide some coefficients magnitude dynamics throughout these 70 equations.The results are shown in the Table 1, where Y-axis denotes coefficient magnitude and X-axis -the equation number.If a predictor was not included in equation, its coefficient takes up zero.

Non-farm payrolls (t-2)
From the table one can see, that there are either some more or less consistent predictors, such as M2(t-1), Non-farm Payrolls (t-2) and RGDP (t-1), or not consistent at all (RGDP (t-3), Velocity (t-2) and Non-farm Payrolls (t-3)).The experiment revealed a huge amount of structural breaks while calculating the equations.
Nearly every shift of the data frame a better regression could be built.Moreover, some coefficients even change their magnitude from positive to negative and vice versa.Therefore, we dare to assume, that when skipping some factors which are actually important, coefficients of accounted predictors bear some part of their impact on regression output, see for example Pesaran, M. H., D. Pettenuzzo and A. Timmermann (2006) and Gneiting, T., Raftery, A.E. (2007).For instance, M2(t-1) often performs as quite a significant predictor (coefficient is relatively high).And when M2(t-1) is not accounted in an equation due to multicollinearity reasons doesn't mean it stopped its influence on CPI.Its impact is just redistributed among other predictors coefficients, refer to Justiniano, A. and G. E. Primiceri (2008), Giordani, P. and M. Villani (2010) and Jore, A.S., Mitchell, J., Vahey, S.P. ( 2010).And if during forecasting M2(t-1) starts to display greater volatility, our equation will not be any more consistent as it doesn't comprise M2(t-1) as an explanatory variable.That's why a researcher should aspire to include as many dependent variables into a model as possible.But at the same time the more predictors one takes the higher risk of multicollinearity and greater errors in coefficients estimators is (Primiceri, G.E. (2005) and P. Newbold (1997)).Thus, to find the optimal balance is very important.Moving forward, we noticed that within one data frame several regression equations can be specified which would satisfy inequality set (2).And the problem of selecting one of them converges just to picking an equation with the minimum MSE.However, using the latter for forecasting doesn't guarantee us the best prediction compared to other possible regressions which were discarded, especially if MSE differ insignificantly.For example, let us consider three possible regression equations R 1 , R 2 and R 3 for the very first data frame.We provide general information about each of them in Table 2.All three equations satisfy inequality set (2) and according to the chosen algorithm we should pick R 1 for making forecast.However, applying R 1 to calculate the future magnitude of the dependent variable we will not always receive smaller forecast errors than if we used R 2 or R 3 .In order to illustrate this point, we take the data frame equal to 30 observations, starting from the first forecast value, and compute MSE for all three regressions considered.This procedure is made for 40 times shifting the frame by one observation forward.Thus, we take a look at how these regression equations would have worked if we had derived them back in1984.The results are shown in Figure 1.One can see that MSE of R 1 , R 2 and R 3 don't stay in the same order as in Table 2 and interweave throughout the time.It underlines the main idea of this section, that if the quality of several regression equations is more or less equal, then there is a high uncertainty level concerning which equation to choose for prediction.Therefore in the following section authors make an attempt to solve this problem.

The Method
Imagine we have a target variable and a relatively large set of dependent variables , … .Let us also assume it is possible to derive l regression equations , … within one observations frame and each has 0; and satisfies OLS prerequisites with significant predictors.Then we can create a new regression equation , using the weighted sum of already computed ones.(4) From formula 4 one can see that mean error of the combined model consists of the variance of errors and the variance of the regression line.Variance of errors is computed as follows. (5) Variance of the regression line can be presented in the following form. (6) Hence our task converges to computation of covariance between two regressions included in the ultimate model. ( where is a column-vector of the i-th model predictors values used for forecasting Thus variance of the combined model may be rewritten as in formula 8. (8) For being able to get from formula 8, we need an unbiased estimator of models residuals covariance, which will be computed as presented below. ( where is a number of i-th model parameters As the main goal of the method being elaborated is to create an equation, which would outperform all the other computed, we encounter a problem of minimization, which can be easily solved by quadratic programming method.In this case optimization task will look as follows.
(10) Let us also take into account the fact that if the set of are i.i.d.variables and are subject to 0; (what we assume to be true for all our models residuals), then ́ 0; ́ as well.Thus, we can figure out the interval estimate for the combined model forecast.
We point out that it is common for confidence interval computation in case of linear regression to use T-distribution with (n-k-1) degrees of freedom if a constant is present in the equation and with (n-k) degrees of freedom in case of its absence.(11) where s is an unbiased estimator of the true standard deviation In case of newly computed regression model, formula 11 will look as follows.
(12) Now our task converges to the search of degrees of freedom number r.As (13) Then we can infer, that ( Using the fact, that the variance of chi-square distribution is two times as great as the number of degrees of freedom, we have: as we suppose there is no dependencies among covariances.
In order to compute the variance of residuals covariance let us refer to the Wishart distribution.Hence we give a quick reference on this distribution.Suppose is an × matrix, each row of wich is independently drawn from a p-variate normal distribution with zero mean.The positive integer is the number of degrees of freedom.If = = 1 then this distribution is a chi-squared distribution with degrees of freedom.
Thus we can write that We also know that the variance of a matrix element which is subject to the Wishart distribution is computed as follows. (20) Applying formula 19 to the formula 20 we obtain ( 21) Making some trivial calculations we get ( 22) Integrating ( 19) into ( 18) and then applying it to formula 16 we obtain the number of degrees of freedom . (23) As we do not know the true variance of the ultimate model and true covariances and variances of models included in the ultimate one, we should substitute them with their unbiased point estimator.And in this case we get our final formula of degrees of freedom.
(24) Thus, not only does proposed method allow to significantly decrease MSE acceptance of the hypothesis about residuals independence, but also it reduces its confidence interval.To summarize calculations and inferences above, we provide a stepwise algorithm of regression computation.
1. Calculate all possible regression equations satisfying OLS prerequisites with significant predictors; 2. Find by solving the optimization problem (10); 3. Compute regression by formula 3 using already obtained weighing functions ; 4. Calculate a point forecast using regression from the previous step; 5. Compute confidence interval for the point forecast by formula 12 with degrees of freedom computed by formula 24.
An ideal case would be several regression equations with equal MSE and independent residuals.Just three regression equations which meet all the requirements allow us to decrease MSE of the fittest equation by about 40%.It is also worth noticing that according to the algorithm above proposed method allows deriving an equation, which would comprise even more explanatory variables than there are observations.The prominent feature of the method is that it is fully automatic and can operate without any interference from a researcher.

Application to Real Data
In this section we use the same data as in Section 1 for testing the method.In Table 3 we provide pair correlation for residuals of chosen regressions R 1 , R 2 and R 3 . ( ) )( ) ) ) ) As one can see positive correlation amongst these regression residuals is not very strong although significant.But as it will be shown further even with such equations we can substantially improve the accuracy of the forecast.We also provide in Table 4 Kolmogorov-Smirnov test for normality of the residuals in order to make sure, that we can use parametric interval estimators of forecast errors.From Table 4 we see, that residuals for regression (Res) show even more normality than any of selected regressions separately (Res1, Res2, Res3).Thus, interval estimates will be also more reliable for regression what is another doubtless positive feature of the method.In Figure 2 we compare three regression residuals, which were already shown in Figure 1, with regression , computed according to proposed method.will stay more preferable unless there are some drastic shifts in MSE proportions of selected regressions.We can see that situation in Figure 2 around 19 th -22 nd data frame where Res3 was much lower than Res1 and Res2 what reflected on negatively.Moving forward, in order to illustrate that the method allows us to get more accurate coefficients estimators we provide in Figure 3 MSE for regression (R 4 ) built using OLS, and which includes all the predictors used in .We also provide the MSE for regression (R 5 ), which includes all predictors, claimed for possible inclusion into the model in Section 1.The reason why is substantially better, than R 4 and R 5 is the following.Due to high multicollinearity there is a great risk of receiving wrong coefficients estimators, which actually don't properly reflect true dependencies amongst data.Of course within the data frame for deriving an equation the more predictors we use, the more accurate the model is.But as soon as we start using it for real forecast it returns much greater errors than an equation with less number of predictors but better satisfaction of OLS requirements.And dynamics of MSE for R 5 only proves this statement.

Conclusion
In the paper we try to make a small step on the way of improving existing methods of forecasting economic processes.We reckon that the main problem of all methods invented so far is inability to grasp larger number of predictors.Therefore a researcher may skip a lot of data which are significant and, thus, crucial for future predictions.Due to the peculiarity of economic systems, which is a dramatic lack of observations, it is quite a complicated task to derive a consistent forecasting method without enough statistical data.That is why we consider proposed method as extremely actual nowadays.Using it one can grasp more explanatory variables and calculate more consistent coefficients.It makes possible, for example, to include variables with high multicollinearity into a model and still get substantial coefficients, which would not be possible using just OLS.Additionally as it was mentioned before the method can be programmed and compute forecasts without any help from outside.The researcher's task is just to specify which variables to include for conducting selection, type of model and way of data transformation.Empirical results illustrate that even if method requirements are not fully met it is still possible to reach significantly better regression equation.Taking into account mentioned above, proposed method can be applied in different econometric models and specifically for government needs.Speaking of the future plans concerning further development of forecasting methods we plan to work out the way to apply the enunciated method for non-linear models.We also plan to work out the method of integration of equations computed on different data frames as an extension of the present work and believe that this procedure will help eliminate the problem of data frame selection.

Figure 1 .
Figure 1.MSE dynamic for R 1 , R 2 and R 3 function let us first consider standard deviation of , which denotes MSE of .According to (3) is presented by the following expression.
distribution is the probability distribution of the × random matrix =

Figure 3 .
Figure 3. MSE dynamic for R 4 , R 5 and R

Table 1 .
Coefficients value dynamics

Table 2 .
Summary for R 1 , R 2 and R 3

Table 3 .
Pearson's correlation for residuals of R 1 , R 2 and R 3