Justification of Wold ’ s Theorem and the Unbiasedness of a Stable Vector Autoregressive Time Series Model Forecasts

In this work, the multivariate analogue to the univariate Wold’s theorem for a purely non-deterministic stable vector time series process was presented and justified using the method of undetermined coefficients. By this method, a finite vector autoregressive process of order p [VAR(p)] was represented as an infinite vector moving average (VEMA) process which was found to be the same as the Wold’s representation. Thus, obtaining the properties of a VAR(p) process is equivalent to obtaining the properties of an infinite VEMA process. The proof of the unbiasedness of forecasts followed immediately based on the fact that a stable VAR process can be represented as an infinite VEMA process.


Introduction
World's theorem is widely embraced in theoretical time series frame work because of its seemingly special representation and simplicity.The simple representation of the Wold's theorem is very remarkable and is a special case of the moving average process (Kenneth, 2014).The most advantageous part of the Wold's theorem is that it allows a dynamic evolution of a process to be approximated by a linear model and because of this feature; many researches consider the Wold's theorem as an existence theorem.Wold's theorem is sometimes referred to as Wold's decomposition theorem and is mostly applied in time series analysis.According to Borghers and Wessa (Borghers & Wessa, 2015), the fundamental justification for time series analysis is due to Wold's decomposition theorem.Kalliianpur (Kalliianpur, 1979) presented a method for constructing the Wold's decomposition for multivariate stationary stochastic processes.The method was based on orthogonal decompositions obtained by forming orthogonal projections onto its component processes.However, it was discovered that the method does not give a complete solution to the Wold decomposition problem.
Papoulis (Papoulis, 1985) examined the concepts of predictability and band-limitedness in the Wold's decomposition theorem.He considered a real discrete time wide sense stationary process with autocorrelation and power spectrum.An extreme case of a bilinear process whose spectrum consists of lines only was also considered.It was shown that the values of a bilinear process were linearly dependents.This result was used to proof the wold's decomposition theorem in the context of mean-square estimation.
Jaydeb (Jaydeb, 2015) obtained a complete description of the class of n-tuples of doubly commuting isometrics.In particular, he presented a several variables analogue of the Wold's decomposition for isometrics on Hilbert spaces.The main result obtained was the generalization of the Slocinski's Wold-type decomposition of a pair of doubly commuting isometrics.
De Nicolao and Ferrari-Trecate (De Nicolao & Ferrari-Trecate, 2015) considered the Wold's decomposition of discrete-time cyclostationary (CS) process into regular and predictable component.The main result showed that predictable CS processes are linear combination of sine waves.The frequency of the sinusoids was found to be associated with the location on the unit circle of the zeros of the periodic error filter.From a spectral view point, it was discovered that an mth order predictable CS process exhibited a sequence of spectral lines.In practical prediction, De Nicolao and Ferrari-Trecate (De Nicolao & Ferrari-Trecate, 2015) found that the detection of spectral lines can then be used to separate the regular and predictable parts of a CS process, in analogy with deseasoning techniques used in the analysis of stationary time series.
Ansley (Ansley, et al, 1976) used Hilbert space methods to develop a rigorous proof that the sum of two uncorrelated moving average processes of order  1 and  2 is an MA process of order  1 ≤ max ( 1 ,  2 ).The methods established the existence of suitable random shocks for the summed process, they illuminate relationships between the coefficients of such processes and their random shocks, and they provided means for proving that the random shocks of the summed processes are normal when the shocks of the underlying processes are normal.The Wold's decomposition was examined in terms of multiple representations of an MA process.
Caines and Gerencser (Caines & Gerencser, 1991) showed that the transform (),  ∈ , of the coefficient sequence of the Wold's decomposition of any full rank wide sense stationary purely non-deterministic stochastic process satisfies the condition that () ∈  2 () and  −1 () ∈ ().It was further shown that all spectral factors satisfying the two conditions are equal up to right multiplication by orthogonal matrices, and that among these, the normalized ((0) = 1) spectral factors are equal to the transform of the Wold's decomposition.An elementary proof of Youla's theorem was then given together with a simple proof that the rows of a Cholesky factor of a banded block Toeplitz matrix converge to the coefficients of a stable matrix polynomial.
Olofsson (Olofsson, 2004) presented a Wold's decomposition of a two-isometric operator on a general Hilbert space.A pure two isometry was shown to be unitarily equivalent to a shift operator on a Dirichlet space corresponding to a positive operator measure on the unit circle.The result contained a previous result by Ritcher (Ritcher, 1998) as well as the result of von Neumann-Wold's decomposition of an isometry.Katsoulis and Kribs (Katsoulis & Kribs, 2005) applied the Wold's decomposition to the study of row contractions associated with directed graphs.The work extended several fundamental theorems from the case of single vertex graphs to the general case of countable directed graphs with no sinks.The Szego-type factorization theorem for Countz-Krieger-Teoplitz families was proved which led to information on the structure of the unit ball in a free semi-group algebra.This showed that the joint similarity implied joint unitary equivalence for such families.For each group, Katsoulis and Kribs (Katsoulis & Kribs, 2005) proved a generalization of von Neumann's inequality which applied to row contractions of operators on Hilbert space which are related to the graph in conventional way.The results yielded a functional calculus determined by quiver algebras and free semi-group algebras.
As noted in the review, many researchers have proved and applied the Wold's theorem in different frame works.In this work, however, the intention is to proof the unbiasedness of forecasts of a stable VAR process based on the fact that a finite VAR() is shown to have the same representation as the Wold's theorem.

Methodology
In this work, the underlined letters are used to represent vectors and matrices.

Stationarity
A time series is said to be stationary if the statistical property e.g. the mean and variance are constant through time.For a multivariate process, stationarity of a time series   is achieved if  [  ] = ; a vector of constants.

Backward shift Operator
The Backward shift Operator  is defined by
where Σ  is covariance matrix which is assume to be non singular.

Vector Autoregressive (VAR) Model
Several multivariate time series model are in existence.However, the most commonly used model is the Vector Autoregressive (VAR) Model.VAR model is an independent reduced form dynamic model which involves constructing an equation that makes each endogenous variable a function of its own past values and past values of all other endogenous variables.A stable -lagged vector autoregressive [VAR()] model has the form: Where   = ( 1 , … ,   ) ′ is a ( × 1) vector of time series variable,   are fixed ( × ) coefficient matrices,   = ( 1 , … ,   ) ′ is a ( × 1) vector white noise process or innovation process.
The model (3) can be written explicitly in matrix form as: (4)

The Wold's Decomposition Theorem for a Univariate Time Series
The Wold's decomposition theorem for a univariate time series states that any zero-mean discrete stationary process {  } can be expressed as the sum of two uncorrelated components (processes): Where   =   − ,  0 = 1, ∑   2 ∞ =0 < ∞, {  }~(0,   2 ), (  ,   ) = 0, ∀,  ∈ ℤ and {  } is deterministic.The sequence {  }, {  } and {  } are unique.Thus, we can express In most cases,   = 0, ∀ and   becomes purely non-deterministic.The Wold decomposition consists of two parts: the purely deterministic and purely non-deterministic.Since the process is zero-mean stationary,   =   −   ; so that (6) can equivalently be presented as: < ∞ (7) Also, since   is a pure random process, [  ] = 0 (8) and From ( 7), we obtain the mean, For the variance, we have From (9), we have Thus, the variance is finite and not time dependent.
For the autocovariances, we have .
Thus, the autocovariances are only functions of the lag  and the process is covariance stationary.
The autocorrelation function at lag  is given as According to the Wold decomposition theorem, any discrete stationary model (especially ARMA) can be represented on the basis of this decomposition.

Wold's Theorem for the Multivariate Model
The multivariate analogue to the univariate Wold's theorem is that if {  } is a purely non-deterministic stationary process with mean vector ; then   −  can be represented as a linear combination of weighted lagged vector white noise processes.That is, ; where  0 =  Let   =   − ; then where is a ( × ) matrix such that     =  − ;   is a ( × 1) random vector;   's are fixed ( × ) coefficient matrices;  − is a ( × 1) vector of white noise process at lag ;   is a  − dimensional white noise or innovation process such that  [  ] = 0,  [  ,   ′ ] = Σ  and  [  ,   ′ ] = 0 for  ≠ .The covariance matrix Σ  is assumed to be non-singular.

Justification of the Wold's Expression (11)
To justify the Wold's theorem, there is need to define some stationary processes that will lead to the results.

Vector Moving Average Model of Order 𝑞
A vector time series process {  } is said to be a vector moving average model of order  denoted as () if it can be represented as where   is a vector of white noise processes.It should be noted here that the expressions (10) and ( 12) seems to be the same; except that the () model is finite while the Wold's model is infinite.

Vector Autoregressive Model of Order 𝑝
We earlier noted that a stationary () can be represented as  17) simply shows that a () process can be represented as an infinite () process.The Wold's theorem emphasizes that any stationary process can be represented as a linear combination of weighted lagged vector white noise processes.Now, we have a stationary () process represented as (15) which result in equation ( 16).
From equation ( 16), we can write   = ()  (18) Substituting equation ( 13) in ( 18 Thus given a () process, the   s ( = 0,1,3, … ) can easily be obtained by equating the coefficients of like powers of s on both sides of (19).Since the   s are the coefficient matrix of the given () multivariate model, the   s can easily be obtained by the method of undetermined coefficients.These   s are the coefficients of an infinite VEMA process represented as: 20) This simply means that a () process can be converted to an infinite VEMA process which is nothing else but the Wold's representation (10) or (11).
It further implies that obtaining the properties of a () process is equivalent to obtaining the properties of an infinite VEMA process or the Wold's representation (10).
The proof of unbiasedness of forecast shall be based on this fact that a stationary VAR process can be represented as an infinite VEMA process.

Proof of the Unbiasedness of Forecasts
Let the forecast origin be  and the lead time be .The major concern is to forecast a vector  + ( ≥ 1) when we are currently at time .
Thus, the best predictor of  + is Since  + and   ̂() are linear combinations of the   's; the forecast error   () for lead time  will also be a linear combination of the   's.That is, From ( 21) and ( 22 Since [  ()] = 0; the forecast is unbiased.

Discussion and Conclusion
Since it has been shown in this work that a finite VAR() can have an equivalent Wold's representation; it means obtaining the properties of the Wold's process is equivalent to obtaining the properties of the VAR() process that gives rise to the Wold's process.To be more scientific, it can be said here that the underlying linear structure in the time domain that generate the VAR process in general is the Wold's process.This idea is specific for linear processes only.However, the presence of non linear structure in most time varying quantities cannot be completely ruled out.