D-optimal Design in Linear Model With Different Heteroscedasticity Structures

In this paper, we developed D-optimal design in linear model with two explanatory variables in the presence of heteroscedasticity. A sequential method of getting D-optimal design was adopted. Two different structures were used based on the literatures; it was found that the optimal design takes the extreme values of the design region. The results of simulated data was justified with real life data from the kinematic viscosity of a lubricant, in stokes, as a function of temperature and pressure which was used as discussed in Linssen (1975). The relative efficiency of other designs with respect to D-optimal designs was determined. Three correction methods was adopted from weighted least square method for heteroscedasticity problem, it was found that the correction method tagged HCW1 performed better.


Introduction
Experimentation is the process of planning a study to meet specified objectives which constitutes a foundation of the empirical sciences (Zhu, 2012). One major advantage of experiment is its ability to control the experimental conditions; as well as to determine the variables to include in a study (FackleFornius, 2008). Since the introduction of experimental design principle in the first half of the 1930, optimal experimental designs have been gaining attention and had become useful tools among researchers in various fields (Atkinson and Donev, 1992;Atkinson, 1996;Atkinson, Donev and Tobias, 2007;Berger and Wong, 2009). There are various design criteria, D-optimality has been the most frequently used; and often performs better than other criteria (Zocchi and Atkinson, 1999;Atkinson et al., 2007). Hence, the D-optimality has become one of the most popular criteria which involve designs that minimize the generalized variance of the parameter vector. The D-optimal designs seek to minimize |(X ′ X) −1 | (dispersion matrix) or equivalently maximise the determinant of the information matrix (X ′ X) of the design through some forms of statistical modeling such as regression model. One of the important assumptions of the standard regression model is that the variance of the error terms (disturbance term, ) must be equal across the observations which is refers to as homoscedastic with the model = + where [ ( 2 ) = 2 = 1,2, ⋯ , ]. However, in real life situations, this assumption is often violated and the variances of the error terms are not the same. The condition where error terms have different variances is termed heteroscedasticity [ ( 2 ) = 2 = 1,2, ⋯ , ] that is, unequal variance across the observations (Lambert, 2013;Knaub, 2017). Heteroscedasticity, which is often referred to as a "problem" that needs to be "solved" or "corrected" is the change in variance of predicted y, given different values of the independent variables (Knaub, 2011(Knaub, , 2017. The aim of this research work is to examine D-optimal Designs with different heteroscedastcity Structures and the objectives are to construct D-optimal design with different heteroscedasticity structures, to obtain the relative efficiencies of other designs with respect to D-optimal design, to determine the heteroscedasticity correction measure that will produce the most efficient D-optimal design in the different structures, determining the relative efficiencies of the parameters of the D-Optimal design model and to establish the best heteroscedasticity correction measure to achieve the most Efficient Parameter Estimation for D-Optimal Design. Yan and Raymond (2001) presented D-optimal designs for two-variable logistic regression models where two-variable were fitted in the logistic regression models. Jafari (2013) found locally D-optimal design for a logit model in discrete choice experiment where there are many alternative set for people to make their choice using D-optimal design for the combination of the level of attributes to create alternatives. Jafari, et.al,(2014) worked on D-optimal design for logistic regression model with three independent variables; they obtained a locally D-optimal design for several specific states, presented certain designs with different points and calculated the subject optimality based on space of the parameters. Jafari and Maram (2015) explored the notion of Bayesian D-optimal design for logistic regression model with exponential distribution for random intercept and obtained Bayesian D-optimal design; the method to maximize the Bayesian D-optimal criterion which is a function of the quasi-information matrix that depends on the unknown parameters of the model. Jesús López-Fidalgo and Garcet-Rodrí guez, (2004) considered the problem of constructing optimal designs for regression models when the design space is a product space and some of the variables are not under the control of the practitioner.Zhide and Douglas (2004) found locally D-optimal designs for multistage models and heteroscedastic polynomial regression model where they considered the construction of locally D-optimal designs for non-linear, multistage model in which one observes a binary response variable.Gaviriaa and López-Rí osb (2014) worked on locally D-optimal designs with Heteroscedasticity: a comparison between two methodologies, it was found that the optimal design point takes the extreme values for both methods. These prior studies were more particular about the construction of the optimal designs with different models under some assumptions of the explanatory variables. In this study, construction of D-optimal designs in linear model with two explanatory variables in which there is a problem of heteroscedasticity in the model were examined. Different structures were used and the effects were also found on the optimal design.

Simulation Study
Starting with a linear regression model of the form (2.1) Where is the error term which is a stochastic term assumed to be normally distributed with mean zero and variance 2 i.e. ~(0, 2 ).These are fixed independently variables and is the dependent variable and are parameters that are known. The generations of the data used for independent variables are random variables that are normally distributed Where K is the correlation between the explanatory variables, 1 2 are the independent standard normal distribution with mean zero and the unit variance. The response variable was therefore obtained with equation 1~( 0, 2 2 ) (Park, 1966, White, 1980, Guajarati et. al 2012 ~(0, ( 2 2 )) (Box andHill, 1974, Harvey, 1976) The ( ) took any of the structures in equations 2 and 3. The simulations were carried out in one thousand times (1000) at eight sample sizes of 10, 20, 30, 40, 50, 100, 250 and 500.

Construction of D-optimal Design
There are several methods at hand on the practices of determining the optimal design. These include algorithms, sequential, analytical, numerical and graphical methods, used separately or in combinations. There is no method that is generally favorable; it depends on the problem at hand. The method selected in this research work is sequential method of getting D-optimal design; we find the D-optimal design for model with different variance structure of the error term was essentially obtained. For the model (2.1) used in this study, the number of p is 3. Therefore the partial derivative for the model is ′ ( ) = (1, 1 , 2 ) (2.5) The information matrix is now It should be noted that the procedure requires a sufficient number of observations because we have to ensure that the inverse | ′ | −1 exist. A simple condition that will guarantee the inverse exists is to have the number of different design points greater than or equal to the number of parameters, that is ≥ The design points are selected within the range of −1 ≤ ≤ 1 for the variables. The largest ( , ) is found for 1 = 1.00000 2 = −1.000000, so these design points were added to design matrix 3 and the design matrix is now 4 = [ (2.8) The iteration continued until the condition for getting optimal design was reached. The maximum ( , ) value decreases as N increases, according to the general equivalence theorem (Kiefer and Wolfowitz, 1960), a D-optimal design satisfies the condition that ( , ) ≤ .

Relative Efficiencies of D-optimal to Other Designs
The Efficiency of D-optimal design with respect to the other design is Where p is the number of parameters of the model and ( ) denotes the information matrix of the design which is another design different from D-optimal design. Relative efficiencies of the parameters of the D-optimal design and non optimal designs models were also done to establish the result of D-optimal designs point. The design points for all the structures were obtained with respect to the probability, number of iteration, the standardized variance.

Most Efficient Correction Method
The best correction method among the one named HCW1, HCW2 and HCW3 was determined. This was done by calculating the variances for the probabilities of the D-optimal designs taking the design points as and the probabilities as ( ). The minimum variances were selected for the structures for all the sample sizes and the method that has highest values was chosen to be the most efficient.

Real Life Application
Construction of D-optimal design in the presence of heteroscedasticity for the model (1) was applied to a real life data, a secondary data from the kinematic viscosity of a lubricant, in stokes, as a function of temperature( ), and pressure in atmospheres (atm), was used as discussed in Linssen (1975) where y is predicted In (viscosity), 1 is temperature, and In this work, D-optimal designs with two different heteroscedasticity structures were constructed when there is no heteroscedasticity (No H) and when there is (HR). It was generally found that the D-optimal designs take the extreme values of the response variables which follow uniform distribution of the experimental units  Table 3.1 presents the construction of the D-optimal when there is no heteroscedasticity and when there is heteroscedasticity for the error structures. It can be seen that the D-optimal designs when there is no heteroscedasticity for the two structures were same reason being that the error term have equal variance. The optimal designs even though the model has three parameters the design consists four points which are the extreme points of the regression range.From the table, it can been seen that * = { (−1, −1) (−1,1) 0.24138 0.25862 (1, −1) (1,1) 0.25000 0.250000 } (3.1) if there are 116 experimental units, 28 should be allocated to when 1 = −1 and 2 = −1, 30 should be for when 1 = −1 and 2 = 1. In the same vein, 29 should be allocated to when 1 = 1 and 2 = −1 and when 1 = 1 and 2 = 1.
Considering D-optimal design for the second structure, * = { (−1, −1) (−1,1) 0.23656 0.25806 Equation shows that if there are 93 experimental units, 22 should be allocated to when 1 = −1 and 2 = −1, 24 should be for when 1 = −1 and 2 = 1and when 1 = 1 and 2 = −1, 23 for when 1 = 1 and 2 = 1. The results still revealed that the D-optimal design for the real life data presented above affirmed the result from simulated data in the sense that the design point takes the extreme values of the design region.
The relative efficiencies of D-optimal design with respect to other designs that are not optimal using the same method of construction of D-optimal design from the starting design matrix of point 4 is given below for the structures.  Table 3.3 shows that the D-optimal design has close efficiency to other design especially the one closed to the design point meaning that the closer the D-efficient to one, the better. The no of iteration for D-optimal design for the first structure is 116 and for the second structure 93. Next table present the D-efficiency of the real life data. To determine the best correction method, the variances of the probability in the design point of the D-optimal design were calculated using different sample sizes. The best method was chosen on the basis of the one with minimum variance. Table 3.5 presented the variances of design points.

Conclusion
In the study, constructions of D-optimal designs in the presence of Heteroscedasticity for two different structures were considered with when there is no Heteroscedasticity in the data.
It was generally found that the D-optimal designs take the extreme values of the response variables which follow uniform distribution of the experimental units which can be interpreted as taking the least and the highest values of the explanatory variables in order to get best output through the response variable. To verify the above findings, a set of real life data (secondary data) was used and the design points for D-optimal designs were same with simulated data.
The relative efficiencies of other designs under different Heteroscedasticity structures were found to prove the strength of the design. Determination of the best correction method was also found. This was achieved by comparing the variances of the selected correction methods with respect to sample sizes for all the structures used in the study. It was found that the correction method with minimum variance that showed the efficiency of the method represented by (HCW1) which was done by regressing ̂2 on the linear combinations of 1 2 performed better than the remaining two.