Statistical Inference on Semiparametric Spatial Additive Model

There has been a growing interest on using nonparametric and semiparametric modelling techniques for the analysis of spatial data because of their powerfulness in extracting the underlying local patterns in the data. In this study, stimulated by the Boston house price data, we apply a semiparametric spatial additive model to incorporation of spatial effects in regression models. For this semiparametric model, we develop a linear hypothesis test of parametric coefficients as well as a test for the existence of the spatial effects. For the problem of variable selection, the adaptive Lasso method was applied. Monte Carlo simulation studies are conducted to illustrate the finite sample performance of the proposed inference procedures. Finally, an application in Boston housing data is studied.


Introduction
The Boston house price data of Rubinfeld (1978) and Gilley and Pace (1996) is frequently used in literature to illustrate some new statistical methods. The data set consists of the median value of owner-occupied homes in 506 census tracts in the Boston Standard Metropolitan Statistical Area in 1970, and 13 accompanying sociodemographic and related variables, and is available through the R package Sedep.
For the Boston house price data, to capture the "large-scale" locational effects between response variable and the associated 13 covariates, Pace and Gilley (1997) proposed the following linear regression model β j x i j + β 14 u i v i + β 15 u i + β 16 v i + β 17 u 2 i + β 18 v 2 i + ε i , i = 1, 2, · · · , 506, where (u i , v i ) is the latitude(LAT) and longitude(LON) of the ith observation. However, sometimes, the quadratic expression involving latitude and longitude is not adequate for the real locational effects. To solve this problem, we apply the following semiparametric spatial additive model to fit the data set where y i and x i = (x i1 , x i2 , · · · , x ip ) T are observations of the response and associated explanatory variables at location (u i , v i ) in the studied geographical region, β = (β 1 , β 2 , · · · , β p ) T is a vector of p-dimensional unknown parameters, and ε i is model error with mean zero and variance σ 2 . f (·) and g(·) are unknown smooth functions of latitude and longitude respectively. For the identifiability of nonparametric functions, we assume that E f (u) = 0 and E g(v) = 0. Without loss of generality, we also assume that both the y i and x i have been centered about their respective means. Model (2) can be regarded as an extension of the following semiparametric spatial model of McMillen (2012), where m(u, v) is an unknown nonparametric function on the latitude and longitude. Compared with spatial autoregressive models, spatial erorr model and other standard spatial econometric models, model (2) does not require use of an n × n spatial weight matrix W, and has a greater flexibility to deal with spatial effects.
It is remarked that model (2) is a special partially linear additive models in statistical literature, which have been studied by Li (2000), Jiang et al. (2007), Liang et al. (2008), Liu et al. (2011), Wei and Liu (2012), more details can be found in Li and Racine (2007). In this paper, we focus on the statistical inference of model (2), including testing and variable selection.
This rest of this paper is organized as follows. We introduce the profile least-square estimation approach in Section 2. Testing for the parametric and nonparametric components is discussed in Section 3. Section 4 studies the variable selection procedure. Simulations are studied in Section 5 to illustrate the performance of the proposed approaches. As an application example, the Boston house price data are analyzed by the proposed methods in Section 6. Discussion is given in Section 7.

Profile Least-Squares Estimation
As the basis for the problem of testing and variable selection of model (2), estimation of model (2) is first briefly described in this section. Many methods can be applied to estimate model (2), we apply the profile least-square approach of Liang et al. (2008) and Wei and Liu (2012) as its simplicity.
then, model (2) can be written as Suppose the parametric components β and the nonparametric function g are known, then model (2) becomes the following standard nonparametric regression model with We apply the local linear approach to estimate the nonparametric function f (·). Assume that f (u i ) has a continuous second derivative for any fixed u, then by Taylor expansion, we have Then, we can estimate f (u) and f (u) by the following weighted local least-squares problems: with kernel function K h 1 (·) = K(·/h 1 )/h 1 , and bandwidth h 1 . By solving the problem (5), we obtain the initial estimator of f (u) asf with e 1 = (1, 0) T , K u = diag{K h 1 (u 1 − u), K h 1 (u 2 − u), · · · , K h 1 (u n − u)}, and We take u to be each of u 1 , u 2 , · · · , u n , together with the condition n i=1 f (u i ) = 0, we can obtain the estimator of f aŝ with S * u = (I n − 11 T )S u , Similarly, the estimator of g isĝ Combining (7) and (8), we have the following backfitting estimating equation system By Hastie and Tibshirani (1990) and Opsomer and Ruppert (1997), the estimators of f and g can be obtained aŝ with . Replacing f and g of (3) by their estimatorsf andĝ, respectively, we can obtain a synthetic linear regression model The profile least squares estimation of β can be obtained by the least-squares approach and model (11), Furthermore, we define the final estimators of f and g aŝ By (12) and (13), we havê withŶ = (ŷ 1 ,ŷ 2 , · · · ,ŷ n ) T , andε = (ε 1 ,ε 2 , · · · ,ε n ) T , L = S +X[X TX ] −1XT (I n − S), andX = (I n − S)X.
According to the result in Hastie and Tibishrani (1990, Section 3.4.3), we can use the Cross-Validation (CV) technique to choose bandwidth. The bandwidth is chosen to minimize the expression where h = (h 1 , h 2 ) T ,ε i and l ii (i = 1, 2, · · · , n) are respectively the residuals and the diagonal elements of the matrix L which are shown in (14).

Hypothesis Tests
Based on the estimation method described in the previous section, we propose two hypothesis tests. The first one is for general hypothesis testing of regression parameters, and the second one is for testing the nonparametric functions.

Testing for the Parametric Components
we consider the following linear hypothesis where A is a k × p matrix of known constants and 0 is a k−vector of zero. We shall also assume that rank(A) = k.
Different to the generalized likelihood ratio test of Wei and Liu (2012), we will develop a generalized F-test statistic for the testing problem (15). To construct the test statistic, we first estimate model (2) under the null hypothesis Aβ = 0.
Applying the Lagrange multiplier technique to linear model (11), the restricted profile least-squares estimator of β is obtained asβ Moreover, the restricted estimator of f and g can be defined aŝ Therefore, we can get the residual sum of squares under H 0 as On the other hand, based on the equation (14), the residual sum of squares under the alternative hypothesis is where M 1 = (I − L) T (I − L).
By simple calculation, we have It is obviously that, if H 0 is true, there will be little difference between RSS(H 0 ) and RSS(H 1 ). Under the alternative hypothesis, there should be significant difference between RSS(H 0 ) and RSS(H 1 ), and RSS(H 1 ) should become systematically larger than RSS(H 0 ). Using this fact, we can construct a generalized F test statistic as where ν 1 = tr(M 0 ), δ 1 = tr(M 1 ).
It is obviously that M 0 and M 1 are not symmetric and idempotent matrix, so the statistical distribution of the test statistic F 1 is more complicated. However, By the result of Cleveland and Devlin (1988), if the model error is normal distribution, the null distribution of the test statistic F 1 can be approximated by an F− distribution with degrees of freedom (r 1 , r 2 ), where r 1 = ν 2 1 /ν 2 , r 2 = δ 2 1 /δ 2 , ν 2 = tr(M 2 0 ), δ 2 = tr(M 2 1 ). The application of this approximate procedure can be found in Leung et al. (2000aLeung et al. ( , 2000b and Fotheringham et al. (2002).

Testing for the Existence of Spatial Effects
For the semiparametric spatial additive model (2), an important question is to test the existence of spatial effects. This leads to the following testing problem If the null hypothesis H 0 is true, model (2.1) is turned as the following standard linear regression model Then, the residual sum of squares under H 0 based on the least-squares approach is with N 0 = I n − X(X T X) −1 X T . On the other hand, by (19), the residual sum of squares under the alternative hypothesis H 1 is Y T M 1 Y. Similar to the construction of F 1 , we define the following test statistic as where τ 1 = tr(N 0 − M 1 ).
Then, according to Cleveland and Devlin (1988), we know that the null distribution of test statistic F 2 can be approximated by an F− distribution with degrees of freedom (r * 1 , r 2 ), where r *

Variable Selection by ALASSO Method
As well as other type regression models, an important problem in using the model (2) is variable selection. In practice, many variables can be introduced to the initial analysis. Deciding which covariates to be kept in the final statistical model has always been a tricky task for data analysis. For the standard linear regression model, stepwise regression method can be applied directly by statistical software. However, traditional variable selection methods suffer from several drawbacks, the most severe one of which is the lack of stability as pointed out by Breiman (1996). To solve this problem, some penalized methods for significant variable selection have been proposed in the last decades. Examples include the bridge regression of Frank and Friedman (1993), the least absolute shrinkage and selection operator (LASSO) of Tibshirani (1996), the smoothly clipped absolute deviation (SCAD) of Fan and Li (2001), the elastic net penalty of Zou and Hastie (2006) and the adaptive Lasso (ALASSO) of Zou (2006). For model (2), Liu et al. (2011) developed a variable selection procedure to identify significant linear components using the SCAD based on the polynomial splines method. In the following, we consider the problem of simultaneous variable selection and estimation in model (2)  method. Firstly, according to Tibshirani (1996), the Lasso estimator of the coefficient β in model (2) is obtained by minimizing the residual sum of squares: with respect to β subject to the constraint n j=1 β j ≤ s, where s is a tuning parameter. Equivalently, the Lasso estimator of β can be defined as where λ is a tuning parameter depending on s. The purpose of penalty λ p j=1 |β j | is to shrink some of the coefficients to exactly zero. This makes the LASSO a simultaneous estimation and variable selection procedure. To eliminate the unknown coefficient functionsf(·) andĝ(·) in Q(β), we replace f (u i ) and g(v i ) in (26) byf (u i ) andĝ(v i ) which were defined in (2.7) respectively, then we can obtain As noted by Zou (2006), the Lasso estimator can not attain the oracle property. To solve this problem, following Zou (2006), the ALASSO estimator of β can be defined aŝ whereβ = (β 1 ,β 2 , · · · ,β p ) T is a consistent estimator of β, γ > 0 is constant. One possible choice forβ is the profile least-squares estimatorβ in (12). The LARS algorithm based on R package can be used to obtainβ ALasso .

Simulation Studies
In this section, some simulations are conducted to examine the performance of the proposed procedures.

Spatial Layout and Design of the Experiments
In spatial analysis, a regular lattice has quite a strong background in applications. In our simulations, the observations are collected from a uniform, two-dimensional grid consisting of m × m lattice points with unit distance between any two neighboring points along the horizontal and vertical axes. These m 2 points are arranged in an orthogonal coordinate system in such a way that the spatial coordinates (u i , v i ) for observing the data are where mod(i − 1, m) is the remainder of i − 1 divided by m and [ i−1 m ] is the integer part of number i−1 m . The data are generated from the following semiparametric spatial additive model 2), the following three type models are considered. Model 1 and Model 2 are true spatial additive models, while the Model 3 is not a true additive structure.
To gain an idea of the effect of the distribution of the error on our results, we take the following four different types of the error distribution whose scales are so adjusted that they all have a common variance σ 2 = 0.25, (1)ε i ∼ N(0, 0.5 2 ), (8). The Epanechnikov kernel K(x) = 0.75(1 − x 2 )I |x|≤1 is used in our simulation. Furthermore, we use the CV technique of Section 2 to choose the bandwidths.

The Finite Sample Performance of the Profile Least-Squares Estimation
Let β 1 = 1, β 2 = 2, sample size m = 8, 10, 12 , for each setting, simulation were repeated N = 1000 times. By the profile least-squares approach in Section 2, we can obtain the estimators of β 1 and β 2 . we asses the performance of the estimators based on sample mean (Mean), sample standard deviation (SD) and sample mean squared error (MSE). The simulation results are presented in Table 1-3. We can find that all the estimators of the parametric components are close to the true value. As the sample size increases, the biases, sample standard deviation and sample mean squared error of estimators decrease. To study the finite sample performance of the test statistic F 1 in Section 3.1, for models 1-3, we consider the following testing problem where c = 0, ±0.2, ±0.4, ±0.6, ±0.8 respectively, and β 1 = 1, β 2 = 1 − c. For each given value of c and each type of the error distribution, 1000 replications with m = 10 were run and the rejection rate under the significance level α = 0.05 was computed as the simulated power of our proposed test procedure. The results are shown in Table 4-6. We can see that the rejection frequencies (estimated sizes) under H 0 are all reasonably close to the corresponding significance level 0.05. Under the alternative hypothesis, the rejection rate seems very robust to the variation of the type of error distribution, and increases rapidly as the alternative hypothesis deviates from the null hypothesis. To study the finite sample performance of the test statistic F 2 in Section 3.2, we consider the testing problem H 0 : f (u i ) = g(v i ) = 0 for the following models Take ( in model 5 and model 6. In addition, 1000 realizations were generated with m = 10 to calculate the size and power of F 2 , results are reported in Table 4-6. We can see that the rejection frequencies (estimated sizes) under H 0 are all close to the nominal level 0.05. On the other hand, the rejection rate increases rapidly as the alternative hypothesis deviates from the null hypothesis.
To assess the accuracy of the proposed estimators of the parametric components β = (β 1 , β 2 , · · · , β 8 ) T , a criterion for measuring the "goodness" of an estimator is needed. For this purpose, the mean squared errors (MSE) criterion, defined as MSE = E β − β 2 is used. For comparison with the ALASSO, we also evaluate the MSE of the LASSO, ORACLE and FULL model estimators. The ORACLE estimator is a profile least-squares estimator defined in (12) based on the true model that contains none of the covariates with zero coefficients, while the FULL model estimator is a profile leastsquares estimator defined in (12) based on the full model that contains all of the covariates. The ORACLE estimator is expected to perform best since it is based on the true model which is unknown in practice, and thus serves as a benchmark for comparisons. All our simulations are based on 1000 replications. Table 10 reports the values of MSE of the various estimators.
From Table 10, we can see that, Both Lasso and Adaptive Lasso procedures significantly improve the MSE over the full model for both the model (7) and model (8). As expected, the ORACLE estimator performs best among all estimators. Finally, the increase of sample size certainly improves the estimation accuracy of all estimating procedures. Table 11 presents the average number of "correct" and "incorrect" zero estimates for the ALASSO and LASSO based on 1000 replications. In Table 11, the columns labelled with "C" give the average number of the five zero coefficients correctly set to 0, the columns labelled with "I" give the average number of the three nonzero coefficients incorrectly set to 0. We observe from the table that in all cases the ALASSO provide more accurate number of correct zeros than does the LASSO.

Real Data Analysis
We now illustrate the proposed testing procedure by analyzing the well-known Boston housing prices data. The dependent variable is the log of the median value (in 1000 USD) of the owner-occupied homes in each of the census tracts. To modelling the spatial effects more flexible, we use the following apply the semiparametric spatial additive model β j x i j + β 14 u i v i + ε i , i = 1, 2, · · · , 506, We assume that both the response variable and covariates have been centered about their respective means. Applying the profile least-squares approach of Section 2, both the ALASSO and LASSO procedures of Section 4 to estimate the coefficients β j , j = 1, 2, · · · , 14, results are presented in Table 12.
From Table 12, we can see that higher CRIM, NOX 2 , DIS, TAX, PTRATIO, and LSTAT are all expected to lead to lower housing prices. On the other hand, larger ZN, RM 2 , RAD and B are all expected to lead to higher housing prices. Overall, these estimates are consistent with our expectations. ZN is found to be insignificant in determining housing prices by the results of ALASSO.

Discussion
In this article, we propose a new semiparametric spatial additive model to incorporate spatial effects into regression models. The spatial effects of latitude and longitude were modelled as the nonparametric smoothing functions. However, their interaction effect was modelled as parametric structure. How to analyse the interaction effect by the nonparametric or semiparametric method is an important problem. On the other hand, our study was focused only on spatial cross sectional data. A natural extension is the following semiparameric spatio-temporal additive model by incorporating temporal effects into the model (2), y i = f (u i ) + g(v i ) + m(t i ) + x T i β + ε i , i = 1, 2, · · · , n, where m(·) is the unknown smooth functions of time. The estimation, testing and variable selection procedures of this paper can be applied to this model, this is an important area of future research.

Conflicts of Interest
The authors declare no conflict of interest.