Non-Identifiability of Simultaneous Spatial Autoregressive Model and Singularity of Fisher Information Matrix

Non-Identifiability of Simultaneous Spatial Autoregressive Model and Singularity of Fisher Information Matrix Yuuki Rikimaru1,2 & Ritei Shibata3 1 School of International Liberal Studies, Waseda University, Tokyo, Japan 2 School of Fundamental Science and Technology, Keio University, Kanagawa, Japan 3 Department of Mathematics, Keio University, Kanagawa, Japan Correspondence: Yuuki Rikimaru, School of International Liberal Studies, Waseda University, 1-6-1 Nishi-Waseda, Shinjuku-ku, Tokyo, Japan. E-mail: yuuki@datascience.jp


Introduction
A simultaneous spatial autoregressive model for a weakly stationary random field {X v ; v ∈ Z n } with the mean 0 and the autocovariance function γ h = E(X v X v+h ), h ∈ Z n is the model which satisfies the equation where {ε v ; v ∈ Z n } is a set of uncorrelated random variables with the mean 0 and the variance σ 2 , σ > 0.Here the operator is an n-dimensional transfer function with the real coefficients β k , k ∈ K where 0 ∈ K is a set of finite points k = (k 1 , k 2 , ..., k n ) on Z n and β 0 = 1.We denote the number of elements of K is m, so that the number of regression parameters as m − 1.The operators T j , j = 1, . . ., n are shift operators such as T j X v = X v 1 ,...,v j +1,...,v n .
We assume the following for the weak stationary of the simultaneous spatial autoregressive model throughout this paper.

Non-identifiability of Simultaneous Spatial Autoregressive Model
We first note that any polynomial P(z 1 , . . ., z n ) is decomposable into a product of prime factors h k (z 1 , . . ., z n ), k = 1, ..., p as Therefore, there exist 2 p choices in selecting h k (z 1 , . . ., z n ) or h k (z 1 , . . ., z n ) for k = 1, ..., p to have a transfer function P(z 1 , . . ., z n ) which leads us to the spectral density There is also freedom to add a factor of the form cz ℓ 1 1 z ℓ 2 2 • • • z ℓ n n to the transfer function P(z 1 , . . ., z n ) for any constant c and integers ℓ 1 , ℓ 2 , . . ., ℓ n , since the constant c can be absorbed into the parameter σ 2 .
Example 1.Let us consider a simple one-dimensional autoregressive model, (3) Then there exist 2 2 = 4 different choices of transfer function for the spectral density where z = e iω and α 1 , α 2 ∈ C are the roots of the polynomial P(z) = z + β 1 z 2 + β −1 .In fact, there exist the following four different transfer functions for the spectral density (4).
It is easy to show that each transfer function has real coefficients, providing us a model (3) with different coefficients.The variance parameter σ 2 varies from transfer function to transfer function, σ 2 = σ 2 0 /|α 1 + α 2 | 2 for P 1 (z) and P 2 (z), and σ 2 = σ 2 0 /|1 + α 1 ᾱ2 | 2 for P 3 (z) and P 4 (z).It is easy to see that P 1 (z) and P 2 (z) become identical if and only if α 1 and α 2 are real and α 1 α 2 = 1, and the P 3 (z) and P 4 (z) become identical if and only if α 1 and α 2 are real and α 1 = α 2 .By noting Assumption 1, we see that such conditions are summarized as β 1 = β −1 with β 2 1 < 1/4, that is, time reversible simultaneous spatial autoregressive model.However, it does not mean unique transfer function for the spectral density of time reversible model.The conditions α 1 = α 2 and α 1 α 2 = 1 are not compatible because of Assumption 1.Only two of the four transfer functions become identical and two others are not time reversible.We now see that there is no unique model for the given spectral density (4) .

Maximum Likelihood Estimate
It is well known that the exact likelihood of simultaneous spatial autoregressive model has no closed form in terms of parameters even if the Gaussianity is assumed.Historically, a lot of approximations of the log-likelihood have been proposed.One of such approximations is that based on a modified periodogram, proposed by Guyon (1982).However, the estimation procedure is not only expensive in computation but also inaccurate because it requires multiple integration of the spectral density for each parameter value.In this respect, the approximation recently proposed by Rikimaru & Shibata (2016) is stronger and more straightforward, and closed in time domain.They also proved that the parameter estimate which maximises the approximation L A in the following is asymptotically efficient.
Let us assume that the observations {x v , v ∈ N} are on a rectangular lattice observations are arranged to make a vector x in lexicographic order.By combining the m − 1 dimensional regression parameter vector β whose elements are arranged in lexicographic order of k 0 ∈ K with σ, we have the whole parameter vector θ.An approximation of the log-likelihood of θ proposed by Rikimaru & Shibata (2016) is then where Here the symbol ⊗ is Kronecker product and α n j = 1 + 1/n j , j = 1, . . ., n are shrinkage factors to retain √ N consistency.
The matrix W n is an n × n circulant matrix such that the off-diagonal elements (W n ) j, j+1 are all 1 for j = 1, . . ., n − 1, (W n ) n,1 = 1 and the other elements are all 0. It is clear that The asymptotic efficiency proved is that the covariance matrix of the parameter estimate converges to the lower bound given by the inverse of the Fisher information matrix I(θ), whose elements are given by (Whittle, 1954;Guyon, 1982;Robinson & Vidal Sanz, 2006), provided that I(θ) is non-singular which is a key assumption for the proof.It is rather unusual that the Fisher information matrix is singular in ordinary theory of statistics, but it often happens in case of simultaneous spatial autoregressive model.Before investigating when and why it happens, we will see other problems caused by non-identifiability of the model in maximum likelihood estimation by the following example.This suggests that the Gaussian likelihood function has the same value for such four sets of parameter values, since they share the same covariance structure.Therefore the likelihood function always has four maximum points on parameter space unless some of four transfer functions are identical.In fact, the following result of numerical experiment demonstrates this.In the experiment, N = 1000 random numbers are generated for {X v } by using the transfer function P 1 (z) with Then, the following four maximum likelihood estimates are obtained by maximising L A in this experiment.Therefore, although the maximum likelihood estimate is consistent and asymptotically efficient as is proved, there is no global unique solution.This implies that we always have several different estimates of parameters, which may depend on the initial values of parameters for optimisation algorithm.There would be no good way to avoid such a problem in practice, because the problem is not over-parametrisation but non-identifiability of transfer function for given spectral density or covariances.Only a possible remedy would be to restrict our attention into a specific region of parameter space, which is meaningful for the underlying problem and effective for restricting the transfer function into a unique one.We might have to search for all possible solutions anyway since it would not be so easy to restrict the region beforehand.

Singularity of the Fisher Information Matrix I(θ)
We have seen that several different parameters, θ 1 , θ 2 , ..., θ 2 p are mapped from a given simultaneous spatial autoregressive spectral density.The problem of simultaneous spatial autoregressive model is not only on such a non-identifiability but also on the singularity of the Fisher information matrix which is closely related to the non-identifiability.We will concentrate our attention into the singularity of Fisher information matrix I(θ) in ( 5), which is also the limit of The following theorem states that the Fisher information matrix becomes singular if some of the parameters are duplicated.
Theorem 1 Fisher information matrix I(θ 0 ) becomes singular when some of the parameters are duplicated for the spectral density identified by θ 0 .
The following example illustrates what happens if the Fisher information matrix is singular.It would be clear if we note that the Hessian matrix of the log-likelihood ( 6) is likely to be singular if it happened.
Example 3. Let us consider the same model as in Example 1.As is already seen, if β 1 = β −1 and β 2 1 < 1/4, then the transfer functions P 1 and P 2 or P 3 and P 4 are identical and the Fisher information matrix becomes singular as From the maximum likelihood equation, is asymptotically normally distributed, so that we can only estimate β 1 + β −1 and σ but not individual β 1 or β −1 .

Conditions for Non-Singularity of I(θ)
As is seen from Example 3, singularity of the Fisher information matrix I(θ) causes more serious problem, non-estimability of individual parameters.It would be worthy of investigating what kind of conditions is necessary for the singularity of I(θ) because the Fisher information matrix is a complicated function of parameters and it is not feasible to check it as it is.We first derive a simple necessary and sufficient condition directly derived from the quadratic form of I(θ), Clearly a necessary and sufficient condition for the non-singularity is that the vector y = (y 1 , y 2 , . . ., y m ) is zero whenever Theorem 2 A necessary and sufficient condition for I(θ) to be non-singular is

., m are linearly independent.
Corollary 1 A sufficient condition for non-singularity of I(θ) is that ∂γ k ∂θ j , j = 1, 2, ..., m are linearly independent f or a k.
Proof.We see that Rikimaru & Shibata (2016) by noting that ∂Σ −1 /∂θ p = Σ −1 (∂Σ/∂θ p )Σ −1 .Since the eigenvalues of Σ −1 are bounded away from 0, we have tr It is enough to note that at most N elements of the matrix ∑ m j=1 y j ∂Σ/∂θ j are It is enough to note that on the domain D.
Example 4. Consider a 2-dimensional model, then the spectral density is There exists only two transfer functions 2 ) for this spectral density.This is because P(z 1 , z 2 ) is prime factor.In fact, there exist no polynomials Q 1 (z 1 , z 2 ) and Q 2 (z 1 , z 2 ) of at most order 2 with respect to z 1 and z 2 , such that Therefore, P(z 1 , z 2 ) is not decomposable into a product of transfer functions which accommodate with the underlying model.It is clear that P(z 1 , z 2 ) and P(z −1 1 , z −1 2 ) are identical if and only if β 10 = β −10 and β 01 = β 0−1 .The singularity of I(θ) follows from Theorem 1 as well as from Corollary 2 in this case.
A practical procedure to check if the Fisher information matrix is singular would be through the matrix, where Here We should choose either ℓ j or −ℓ j in L since β ℓ j = β −ℓ j .Let us k i , i = 1, . . ., m are indices arranged in lexicographic order in K and β k = 0 for k K as a convention.Note that it is always true that L > m − 1.
Theorem 3 A necessary and sufficient condition for the non-singularity of I(θ) is that B is of full rank.
Proof.We may restrict our attention into the non-singularity of the first (m − 1) × (m − 1) submatrix of I(θ), since the last row and column off-diagonal elements are all 0 and the diagonal element is (2/σ) 2 .By setting y m = 0 in (8) and introducing A necessary and sufficient condition for the non-singularity of I(θ) is now that ∑ implies y j = 0, j = 1, 2, ..., m − 1.This completes the proof.
Example 5.The matrix B for the model in Example 1 is derived from is zero if and only if β 1 = β −1 since 1 + β 1 + β −1 0 and 1 − β 1 − β −1 0 from Assumption 1.Thus, we see that the condition β 1 = β −1 with β 2 1 < 1/4 is not only necessary and sufficient condition for some of transfer functions being identical, but also for the singularity of the Fisher information matrix in this example.

Unilateral Simultaneous Spatial Autoregressive Model
It is taken it for granted that unilateral simultaneous spatial autoregressive model including AR model in time series, is always identifiable and the Fisher information matrix I(θ) is non-singular.However, it would be worthy of proving in the frame work of simultaneous spatial autoregressive model.Then, it becomes clearer that the problems we have discussed are due to the lack of unilaterality of general simultaneous spatial autoregressive model.
Theorem 4 Unilateral simultaneous spatial autoregressive model is always identifiable and the Fisher information matrix I(θ) is always non-singular.
Proof.It is only possible to choose h k (z 1 , z 2 , ..., z n ), k = 1, 2, ..., p to find out transfer function P(z 1 , z 2 , ..., z n ) for the spectral density (2).Any other choice contradicts with the unilaterality of the model.Therefore unilateral simultaneous spatial autoregressive model is always unique for given spectral density.On the other hand the quadratic form ( 8) is then rewritten as is zero if and only if Y(z 1 , . . ., z n ) = 0 and y m = 0.This proves the non-singularity of I(θ).

Concluding Remarks
We have shown that simultaneous spatial autoregressive model is non-identifiable from the covariance structure or the spectral density.Several different regression parameters with different standard deviation of the disturbance are mapped from a spectral density.Therefore, we have to be careful about estimation of parameters based on the second moments, for example, estimation by Gaussian maximum likelihood principle.There could be many other estimates even if an estimate had been obtained by giving an initial value to an optimisation algorithm.A practical procedure would be to find out all estimates and pick up one which is most meaningful for the underlying phenomena.This non-identifiability of the model has been already mentioned in the context of two sided moving average model (Rosenblatt, 1980).A cure he proposed is to employ bispectrum, which can be applied for the model, too.But we leave it for future investigation, together with an investigation of the type of parameter mapping from the spectral density.
Another problem we have investigated in this paper is possible singularity of the Fisher information matrix, where not all parameters are estimable.Theorem 1 demonstrates that it happens when some of parameters mapped from a spectral density are duplicated.Non-identifiability of simultaneous spatial autoregressive model leads us not only to multiple estimates of parameters but also non-estimable parameters.We need to check such a singularity before estimation.Otherwise, we may face unconvergence of optimisation algorithm or instability of the estimate.Several types of conditions given in Section 4 would be useful for the check.There are a lot of open problems left, for example, converse of Theorem 1 or any other type of necessary and sufficient condition for the non-singularity than that given in Theorem 3.