On Comparison of Local Polynomial Regression Estimators for P = 0 and P = 1 in a Model Based Framework

This article discusses the local polynomial regression estimator for P = 0 and the local polynomial regression estimator for P = 1 in a finite population. The performance criterion exploited in this study focuses on the efficiency of the finite population total estimators. Further, the discussion explores analytical comparisons between the two estimators with respect to asymptotic relative efficiency. In particular, asymptotic properties of the local polynomial regression estimator of finite population total for P = 0 are derived in a model based framework. The results of the local polynomial regression estimator for P = 0 are compared with those of the local polynomial regression estimator for P = 1 studied by Kikechi et al (2018). Variance comparisons are made using the local polynomial regression estimator ?̅?0 for P = 0 and the local polynomial regression estimator ?̅?1 for P = 1 which indicate that the estimators are asymptotically equivalently efficient. Simulation experiments carried out show that the local polynomial regression estimator ?̅?1 outperforms the local polynomial regression estimator ?̅?0 in the linear, quadratic and bump populations.


Introduction
The theory of sample surveys involves principles and methods of collecting and analyzing data from a finite population of units and then making inferences about finite population parameters on the basis of information obtained from the sample. For some early work on survey sampling theory, see Royall (1970a), Royall (1970b), Royall (1971), Smith (1976) and Pfeffermann (1993). In this study, an estimator of the finite population total is developed and its properties derived using the local polynomial regression procedure. Local polynomial regression is a nonparametric technique which is a generalization of kernel regression and is used for smoothing scatter plots and modeling functions. Under normal conditions, when = 0, this is referred to as local constant regression, when = 1, this is local linear regression and when ≥ 2, this is local polynomial regression. is the order of the local polynomial being fit. In local polynomial regression, a low order weighted least squares regression is fit at each point of interest , using data from some neighborhood around ( see Cleveland (1979) and Cleveland and Devlin (1988)).
Once a modeling approach is undertaken, there is a special feature in finite population estimation problems that the unknown quantities are realized values of random variables, so the basic problem has the feature of being similar to a prediction problem. In order to estimate ( ) at a given point , the association between the predictor variable and the response variable is explored. This methodology was introduced by Stone (1977). It has also been studied by Fan (1993), Fan and Gijbels (1996), Breidt and Opsomer (2000) and Kikechi et al (2017). Like in Stone (1977), the main aim of this procedure is to quantify the contribution of the covariate to the response per unit value of in order to summarize the association between the two variables, to predict the mean response for a given value and to extrapolate the results beyond the range of the observed covariate values. A weight .
− ℎ ( , ) where ℎ is the size of the local neighbourhood and ( ) is the unimodal non-negative function. On the other hand, inferences may explore properties of the process that generate the population values (Montanari and Ranalli (2003)). An assumption is made from the fact that the finite population has been generated by a super population model = ( , , ) and it is of interest to estimate the population parameters , where = + . The super population model can be applied to predict the unobserved values ′ after obtaining estimates of and using the known auxiliary information , = 1,2 … , (see Montanari and Ranalli (2005) and Rueda and Sanchez-Borrego (2009)).
The nonparametric approach does not restrict the functional form of the distribution nor does it specify the various stochastic properties such as (. ), (. ) and (. ). Rather, it leaves them to cover broad classes of models, thus allowing for more robust inference than inference obtained in parametric approach. Using the model ξ, the nonparametric estimator of total, has been derived by Nadaraya (1964), Watson (1964), Priestly and Chao (1972), Gasser andMuller (1979), Dorfman (1992) ), Chambers et al (1993) and Odhiambo and Mwalili (2000). In his study, Dorfman (1992) has been able to prove the asymptotic unbiasedness and MSE consistency of this estimator. The estimator, however suffers from sparse sample problem, and more work needs to be done to come up with another technique that can overcome this problem. This is where the local polynomial procedure comes in. See Kikechi et al (2017) and Kikechi et al (2018).
The local polynomial regression is one of the most successfully applied design adaptive non parametric regression. This estimation procedure is an attractive choice due to its flexibility and asymptotic performance. Having a local model (rather than just a point estimate) enables derivation of response adaptive methods for bandwidth and polynomial order selection in a straightforward manner. The procedure has also the advantage of eliminating design bias and alleviating boundary bias. Furthermore, the method adapts well to random, fixed, highly clustered and nearly uniform designs. The weighted least squares principle to be employed in the local polynomial approximation approach, opens the way to a wealth of statistical knowledge and thus providing easy computations and generalizations. See Fan (1992), Fan (1993), Ruppert and Wand (1994) and Fan and Gijbels (1996) among others. Kikechi et al (2018) employ a superpopulation approach to estimate the finite population total using the procedure of local linear regression. Explicitly, the authors derive robustness properties of the local linear regression estimator and carry out simulation experiments on the performances of this estimator in comparison with other estimators that exist in the literature. Results indicate that the local linear regression estimator is more efficient and performing better than the Horvitz-Thompson (1952) and Dorfman (1992) estimators, regardless of whether the model is specified or mispecified. In this paper, the local polynomial regression estimator of finite population total for = 0 is studied and asymptotic properties derived. Analytical comparisons are carried out between this estimator and the local polynomial regression estimator for = 1 studied by Kikechi et al (2018) which indicate that the estimators are asymptotically equivalently efficient. Simulation experiments however indicate that the local polynomial regression estimator ̅ 1 is superior and dominates the local polynomial regression estimator ̅ 0 in the linear, quadratic and bump populations.

Method of Constructing the Local Polynomial Regression Estimator ̅ for =
The superpopulation model considered for estimating the finite population total is given by, Specifically, the following assumptions hold for the model considered in the nonparametric regression estimation of ( ): The properties of the error are given by, The functions ( ) and 2 ( ) are assumed to be smooth and strictly positive. Consider the Taylor series expansion of ( ) expressed as, The Taylor series expansion is written in a general form expressed as, where lies in the interval , − ℎ, + ℎand The constants and are solved using the least squares procedure by making the subject of the formulae, squaring both sides, summing over all possible sample values and applying the weights to obtain a solution to the weighted least squares problem of the form; Differentiating with respect to and equating to zero, gives Then it follows from equation (9) that Similarly, differentiating with respect to and equating to zero, gives and thus Multiplying equation (11) and equation (14) by ( ,2 ) and ( ,1 ) respectively, gives Subtracting equation (16) from equation (15), gives Making the subject of the formulae, gives Similarly, multiplying equation (11) and equation (14) by ( ,1 ) and ( ,0 ) respectively, gives Subtracting equation (20) from equation (19), gives Making the subject of the formulae, gives Now it follows from equation (5) that If the value assigned is zero, assuming that ̅ is a pre-assigned constant, then

Properties of the Local Polynomial Regression Estimator ̅ for =
In deriving the properties of the local polynomial regression estimator, the following assumptions are made according to Ruppert and Wand (1994): (i) The variables lie in the interval (0, 1).
(iii) The kernel ( ) is symmetric and supported on (−1, 1). Also ( ) is bounded and continuous satisfying the The bandwidth ℎ is a sequence of values which depend on the sample size and satisfying ℎ → 0 and ℎ → ∞, as → ∞.
(v) The point at which the estimation is taking place satisfies ℎ < < 1 − ℎ.
Fan (1993) imposed conditions on (. ) and are only used for convenience in terms of technical arguments and thus can be relaxed.

The Expectation of the Local Polynomial Regression Estimator ̅ for = 0
The expectation of ̅ for = 0 is derived as, Using the Taylor series expansion of the form, Theorem 3 in Fan and Gijbels (1996)

The Variance of the Local Polynomial Regression Estimator ̅ for = 0
The variance of the local polynomial regression estimator ̅ is estimated using the variance of the error, thus ( ̅ − ) is derived as where, The asymptotic expression for the variance of ̅ is given by the expression using the results of ̅ ( ) that have been derived, thus

The MSE of the Local Polynomial Regression Estimator ̅ for = 0
Theorem I in Fan (1993) allows that under condition (ii) gives, The asymptotic expression for the MSE of the local polynomial regression estimator ̅ is given by Note that results for the local polynomial regression estimator of finite population total T ̅ for P = 1 have been derived by Kikechi et al (2018).

The Asymptotic Relative Efficiency
The relative efficiency of two procedures is the ratio of their efficiencies, but it is often possible to use the asymptotic relative efficiency, defined as the limit of the relative efficiencies as the sample size grows, as the principal measure of comparison. Let ̅ 0 be the local polynomial regression estimator of finite population total for P = 0 and ̅ 1 be the local polynomial regression estimator of finite population total for P = 1 as studied by Kikechi et al (2018).
If ̅ 0 and ̅ 1 are both unbiased estimators of , then the relative efficiency of ̅ 0 to ̅ 1 is given by, If ̅ 0 and ̅ 1 are both asymptotically unbiased estimators of , then the asymptotic relative efficiency of ̅ 0 to ̅ 1 is given by, Therefore, the estimators of finite population totals for ̅ 0 and ̅ 1 are respectively given by, The variance of the local polynomial regression estimator ̅ 0 is given by, The asymptotic expression for the variance of the local polynomial regression estimator ̅ 0 is estimated by, The variance of the local polynomial regression estimator ̅ 1 is given by, The asymptotic expression for the variance of the local polynomial regression estimator ̅ 1 is estimated by, Note that in Kikechi e tal (2017), . ̅ ( )/ = ℎ 2 ( ) and . ̅ ( )/ = ℎ 2 ( ) Thus the asymptotic relative efficiency of the local polynomial regression estimator ̅ 0 to the local polynomial regression estimator ̅ 1 derived by Kikechi et al (2018) is given by,

Description of the Data Sets
In this section, simulation experiments are carried out to evaluate the performance of the estimators. The data are generated from the regression model of the form, The data sets are obtained by simulation using specific models having relations of the form, for the linear, quadratic and bump populations respectively. The ′ are generated as independent and identically distributed (iid) uniform (0, 1) random variables. The errors are assumed to be independent and identically distributed (iid) random variables with mean 0 and constant variance. The analysis and comparison in terms of performance is based on the local polynomial regression estimator ̅ 0 and the local polynomial regression estimator ̅ 1 . The Epanechnicov kernel given is used for kernel smoothing on each of the populations due to its simplicity and easy computations using well designed computer programs and is defined as,

4√5
(1 − 1 5 2 ) | | < √5 (49) The bandwidths are data driven and are determined by the least squares cross validation method. For each of the three artificial populations of size 200, samples are generated by simple random sampling without replacement using sample size = 60. For each combination of mean function, standard deviation and bandwidth, 500 replicate samples are selected and the estimators calculated.

Results
The results of the bias and mean squared error (MSE) for the local polynomial regression estimator ̅ 0 for = 0 and the local polynomial regression estimator ̅ 1 for = 1 in the linear, quadratic and bump populations are provided in the table below.

Discussion
In estimating ̅ ( ) for the local polynomial regression estimator ̅ 0 , ̅ has been assumed to be a pre-assigned constant and in particular the value assigned is zero. It has therefore been shown in section 2 that the estimator ̅ ( ) is biased leading to a biased estimation of the finite population total. On the other hand, when estimating ̅ ( ) for the local polynomial regression estimator ̅ 1 , the value of ̅ is not pre-assigned but rather determined by the set of data provided and thus minimizing the bias. With regard to asymptotic relative efficiency, there is no difference in the performance of the local polynomial regression estimator ̅ 0 studied in this paper and the local polynomial regression estimator ̅ 1 studied by Kikechi et al (2018). The reason for this being that their ratio converges to 1 as becomes large, see equation (44). This therefore implies that the estimators are asymptotically equivalently efficient. However, it is observed from simulation experiments conducted that the biases and MSEs computed in table 2 for the local polynomial regression estimator ̅ 1 are small in all the three populations. The results therefore indicate that the local polynomial regression estimator ̅ 1 is superior and dominates the local polynomial regression estimator ̅ 0 for the linear, quadratic and bump populations.

Conclusion
In this article the local polynomial regression estimators ̅ 0 and ̅ 1 of finite population totals have been studied in a model based framework. Analytically, variance comparisons are explored using the local polynomial regression estimator ̅ 0 for P = 0 and the local polynomial regression estimator ̅ 1 for P = 1 in which results indicate that the estimators are asymptotically equivalently efficient. Simulation experiments carried out in terms of the biases and MSEs show that the local polynomial regression estimator ̅ 1 outperforms the local polynomial regression estimator ̅ 0 in all the three artificial populations and therefore, ̅ 1 is the most efficient estimator.  Vol. 7, No. 4;