Machine Learning in Macro-Economic Series Forecasting

In this paper I conducted a simple experiment to using Artificial Neural Network in time-series forecasting, by combining First order Markov Switching Model and K-means algorithms, the forecasting performance of machine learning has outperformed the benchmark of time-series inflation rate forecasting. The paper reveal the potential of ANN forecasting, also provide future direction of research.


Introduction
Machine learning only have a short history, however, its success in many fields has become indispensable.Neural networks and deep learning currently provide the best solutions to image recognition, pilot-less drive, translation, speech recognition, and natural language processing.Deep learning successfully beat the top Go game player in the world, the states of Go game is over 10 172 , which probably is the most intelligence game in the world.In this paper, I would like to introduce a method of machine learning and try to implement the method to macroeconomic time series forecasting.Several Major statistical model in time series forecasting include linear regression such as ARIMA, non-linear methods such as Bayesian Forecasting.These methods has been proved be efficient and powerful.On the other hand, economic theory based forecasting such as Phillips curve, Term structure and asset pricing models are also successful in the inflation forecasting.However, in practice, the best forecasts are still the subjective ones, which come from survey such as Blue Chip, SPF and Greenbook, this fact has been found by a number of research, see in Ang, Bekaert, and Wei (2007), Faust and Wright (2013).With this regard, I use survey of professional forecasters (SPF) as major benchmark to compare with neural network forecasting, however, I will list some major methods as references.

1) ARIMA models:
The motivation for the ARMA model derives from a long tradition in rational expectations macroeconomics, consider ARMA(p, q) models, the optimal lag length for the autoregressive model is recursively selected using the Schwartz Information Criterion(BIC) or Akaike Information Criterion(AIC) on the in-sample data, a simple example of ARMA(1, 1) model is as: . For the ARMA (p, q) models, (1) 2) Regime-switching models: Hamilton (1989) proposes that the parameters of an autoregression can be viewed as the outcome of a discrete-state Markov process.The basic univariate regime-switching model is as: (2) The states variable follows a Markov chain with constant transition probabilities, can be estimated by the Bayesian filter algorithms.
3) Phillips-curve-motivated forecasts: These forecast methods could only be used in inflation forecasting, The Phillips curve is the canonical economically motivated approach to forecast inflation since Phillip (1958), The basic idea is the inflation rate change are affected by other variables such as employment rate, output gap, and other variables.The basic model (Note 1) is as: Where μ t-1 is the unemployment rate of t-1, the model could be extend to more variables such as output gap.
To be specific and simplistic, I will limit the forecasting target of machine learning as real GDP growth and inflation (GDP deflator), which are two most important indicators of macro-economy.By using K-means-Markov model, the major measurement of forecasting: Root of Mean of Squared Error (RMSE) of inflation forecasting of Machine learning actually outperform the SPF (Note 2) benchmark.Table 1 reports the benchmarks.H represents different forecast horizons.For example, the H1 would be the current quarters forecast, consider the forecasters are available for the last quarter's information.Stock and Watson (2003) show that the use of aggregate indices of macro series measuring real activity produces better forecasts of inflation and output than individual series.Bernanke, Boivin, and Eliasz (2005) employ 120 macroeconomic time series balanced panel data and conduct a comprehensive research about the effect of monetary policy on the economy with factor-augmented Structural Vector autoregressions (FAVAR), the time series they used can be categorized as: real output and income, employment rate, consumption, housing and sales, real inventories, orders and unfilled orders, stock prices, exchange rates, interest rates, money aggregates, price indexes, average hourly earnings.Note.Some series are aggregated to quarterly from data by taking the average of available.

Data Description
In order to compare with the benchmark, I consider the quarterly data from Real-Time Data Research Center of Federal Reserve Bank of Philadelphia.The sample period is 1968:Q4-2015:Q4.The longer data such as the quarterly data from U.S. Bureau of Economic Analysis (BEA) (Note 3) can be used as references.

K-Means Markov Regime Switching Model
The basic setting of Markov Regime Switching Model is similar with literature, let t y be a sequence of random variables , depends on its own past values, in our case, depending on the state of last period S t-1 of countable set S ( 1,..., t Sk ) called state space, and random shock ℇ t of the stage.Regime transitions may be time-dependent, depending on the length of time spent in the regime as Durland and McCurdy (1994) or on other variables as Diebold, Lee, and Weinbach (1994) proposed (Note 4).(5) However, the key issue in estimating regime transition model is to specify the number of regimes.In our case, by defining the state as the combinations of all variables, I use neural network K-means method training to classify the states.

K-Means Methods and On-line Deep Learning
K-means methods has been used across a large range of application areas in many fields, such as image recognition, dataset classification.K-means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion as the inertia or within-cluster sum-of-squares.This algorithm requires the number of clusters to be specified.
The K-means algorithm divides a set of N samples X into K disjoint clusters, the disjoint set are seperated by the mean j of the samples in each cluster.The means are called the centroids of clusters; The K-means algorithm aims to choose centroids that minimize the inertia, or within-cluster sum of squared criterion: K-means method use the expectation-maximization (EM) algorithm with a small, all-equal, diagonal covariance matrix.The inertia (the within-cluster sum of squares criterion) suffers from some drawbacks: The inertia assume that clusters are convex and isotropic, therefore it responds poorly to elongated clusters, such as belt-like data.The inertia also suffers from the "curse of dimensionality", a dimensionality reduction algorithm such as Principle Component Analysis prior to K-means clustering can alleviate this problem.I adopt the k-means++ algorithm developed by Arthur and Vassilvitskii (2007) to solving the states categorization problem.The tools I used are Scikit-learn by Pedregosa et al. (2011) and Python language.

On-Line Learning: Parameters
To approach a step by step forecasting, I use the same data in previous section, use first N samples (N as a parameter) as initial data, by use K-means method and the transition probability matrix conducted by clustering, and the ANN, conduct an one-step ahead forecast.I also list some important parameters in training: Initial sample size n: I set it as 60-100, as 15 years to 25 years, since the initial sample size actually can be regarded as "memory", which also can be seemed as the business cycle length.Number of states: I have tried from 4 to 30.In common sense, the more states we can estimate and classify, the more accuracy the forecasting can be, since when the cluster become smaller, the forecasting error will be less.However, when the number of states are too large, the clustering will be unstable, and the estimation of transition probability matrix will cause more forecasting error, therefore, the appropriate number of states could be critical.
1 illustrate a two dimensional data, RGDP andPGDP (1947-1969)     The methods also can be extended to multi-step forecasting, or using second-order Markov chain model.Figure 2 illustrate an example of K-means forecast, notice that, the data are 101 quarters after 1968 Q4, therefore the ANN forecast period are from 1994 Q1 to 2015 Q4.Future work could be a further research in how to improve the ANN performance and include multi-step testing, larger data set such as financial market data and the applications of heterogeneous beliefs data in ANN, also the further comparison of different ANN models such as recurrent neural network, deep learning neural network and convolutional neural network.
k-means clustering with four states.The left figure shows the original data, right figure shows the clustering data.

Figure 1 .
Figure 1.Four states K-means clustering in two dimension data from BEA

Figure 2 .
Figure 2.An example of K-means Markov model forecasting

Table 1 .
Forecast error summary statistics of SPF variables Note.RGDP, PGDP denotes real GDP growth rate, inflation rate (price deflator).H denotes forecast time horizons.The time period is 1985Q1-2014Q4.

Table 2 .
Data description Therefore I use 11 time series data as baseline input, include: Federal fund rate, M2 growth (real M2 monetary stock), RPFI (real private investment growth), unemployment rate, working hours per week, stock index return, and bond spread, VXO (implied volatility of stock index return), PCE (Percent Change From Preceding Period in Real Personal Consumption), Real price to dividend ratio, Cyclical Adjusted Price to Earnings Ratio, Full data description refer to Table2.

Table 1 .
Forecasting performance of K-means-Markov model Note.RGDP denotes real GDP growth rate, PGDP denotes inflation rate.SPF denotes Survey of Professional Forecasters benchmark.Kmeans-Markov denotes K-means learning use Markov switching model, with disagreement denotes add the inputs extract from SPF, States denote the clustering number.The forecast period is 1994 Q1 to 2015 Q4.

Table 3
reports the forecasting performance of the first-order Markov Regime switching model with K-means learning, using the same dataset of previous section, the K-means-Markov model outperform SPF in inflation forecast, very close to SPF in forecasting real GDP growth (Note 5).Overall the performance of K-means have beat a lot of standard time series forecasting model.The best RMSE of K-means-Markov model of one-step ahead forecasting of inflation rate and real GDP growth rate as 0.82 and 1.47, comparing with RMSE of SPF forecasting which is conducted by the most professional forecasters as 0.91 and 1.35 from 1986.Notice that the RMSE of SPF forecast of PGDP and RGDP are 0.87 and 1.44 from 1997 Q1 to 2014 Q4, the performance of simple K-means actually close to the performance of SPF.