Multiple Imputation to Correct for Nonresponse Bias: Application in Non-Communicable Disease Risk Factors Survey

Background: This study was carried out to use multiple imputation (MI) in order to correct for the potential nonresponse bias in measurements related to variable fasting blood glucose (FBS) in non-communicable disease risk factors survey conducted in Iran in 2007. Methods: Five multiple imputation methods as bootstrap expectation maximization, multivariate normal regression, univariate linear regression, MI by chained equation, and predictive mean matching were applied to impute variable fasting blood sugar. To make FBS consistent with normality assumption natural logarithm (Ln) and Box-Cox (BC) transformations were used prior to imputation. Measurements from which we intended to remove nonresponse bias included mean of FBS and percentage of those with high FBS. Results: For mean of FBS results didn’t considerably change after applying MI methods. Regarding the prevalence of high blood sugar all methods on original scale tended to increase the estimates except for predictive mean matching that along with all methods on BC or Ln transformed data didn’t change the results. Conclusions: FBS-related measurements didn’t change after applying different MI methods. It seems that nonresponse bias was not an important challenge regarding these measurements. However use of MI methods resulted in more efficient estimations. Further studies are encouraged on accuracy of MI methods in these settings.


Introduction
According to the fact sheets of World Health Organization, updated in January 2015, diabetes account for 1.5 million deaths annually while more than 80% of these deaths happen in low-and middle-income countries (World Health Organization [WHO], 2014).
Provision of accurate information on prevalence of diabetes and its major risk factors will enable us to understand the importance of burden which it imposes on societies and can also be regarded as a necessary part to devise and monitor preventive and controlling programs.
One of the best tools to reach this purpose is STEPS study, devised by WHO, which is a nationwide population based cross-sectional survey to gather information on non-communicable disease risk factors as well as diabetes.
One important issue about STEPS studies is that usually a fraction of sample don't take part in the whole study or some part of it known as nonresponse that can threaten the results of the study by reducing the sample size and in some cases biasing descriptive and analytic statistics.
For example there have been about 17% nonresponses in Fasting Blood Glucose measurement in almost all rounds of these surveys in Iran while no advanced strategies has been applied to treat possible nonresponse bias (Asgari, Mirzazadeh, & Heidarian, 2009;Esteghamati et al., 2008).
There are a wide range of methods to deal with nonresponses (Little & Rubin, 1989). The most approved method is multiple imputation (MI) proposed by Rubin which has now stablished its status in many area of research (Sterne et al., 2009).
The present study aimed to use MI method to correct for potential bias in FBS-related measurements due to nonresponses in Iran's latest published STEPS study conducted by Iran's Ministry of Health in 2007.
Five imputation methods were applied to assess the sensitivity of results to the imputation approach and two transformations were used to make variable FBS consistent with normality assumption.

Subjects and Sample Size
We used data from Iran' STEPS study conducted in 2007 in which 30000 of Iranian people aged 15 to 64 years had been recruited. Data were collected through a questionnaire and by taking a fasting blood sample. We excluded pregnant women and also people aged under 25 since they were not subjected to the biochemical measurements. Finally a data set of 23663 observations and 39 variables was used for the present study. Although detailed information on this survey and its results can be found elsewhere (Asgari, Aghajani, Haghazali, & Heidarian, 2009;Asgari, Mirzazadeh, et al., 2009), some related aspects are described in the following sections.

Survey Setting
To fully grasp survey setting, it is helpful to get familiar with the sampling frame. As it was intended to have survey results represent the whole country in age group 15 to 64 years, determined sample were equally distributed over all 30 provinces, 5 ten-year age groups, and 2 sexes. So 100 people for each sex-age group were selected from each province.
To select 1000 people in each province 50 households were randomly chosen using postal codes. Each selected household and its neighborhood constituted a cluster. Then starting from chosen household, data collectors proceeded in a pre-specified direction to fill in the study questionnaire for 2 individuals in each sex-age group (20 in each cluster). Provinces and clusters were then used respectively for strata and sampling unit in the survey setting. Linearized was set as the type of standard error. Survey setting and weights used in this study was the same as defined by survey team.

Survey Weights
For each sex-age group-province stratum, base weight was its respective number of people in the target population. To account for differences in the size of strata due to nonresponses or possible errors, the adjusted weights were defined as base weights divided by the existing size of the strata. This approach of adjusting weights was also seen in other studies (Schenker et al., 2006;Taylor et al., 2002).

Nonresponse Rate
In the dataset we received the rate of missing data ranged from about 0.01% to 17.80%. The target variable, FBS, contained 17.47% missing values. Only 35.8% of variables were fully observed and 69.2% of observation had all variables observed.

Mechanism of Missing Data
There are three mechanisms responsible for missing data: 1) missing completely at random under which the probability of being missing depends neither to the values of other variables nor to the values of missing information, 2) missing at random in which probability of being missing depends to other variables but conditioning on which is independent from the value of missing information, 3) missing not at random that the likelihood of being missing for a given value depends to its value after conditioning on other variables (Little & Rubin, 2002;Rubin, 1976Rubin, , 1987. With real data sets in which true values of missing data are unknown and all variables related to missingness are not captured it is impossible to prove which mechanism has produced missing data (Joseph L. Schafer, & Olsen, 1998). So we proceeded with no assumption about missing data mechanism.

MI Procedures
produces M data sets that are complete regarding the imputed variables. Then each data set is analyzed separately using standard statistical analysis and finally these results are combined according to Rubin's combination rule defined in formulas 1 to 4 (Rubin, 1987). There can be found a wide range of methods and softwares (Horton & Lipsitz, 2001) for proper (Rubin, 1987)  In this study five MI methods were selected as: 1) bootstrap expectation maximization algorithm (BEM) which uses expectation maximization algorithm introduced by Dempster (Dempster, Laird, & Rubin, 1977;Joseph L. Schafer, 1997) to estimate the posterior distribution of incomplete data and then a bootstrap approach to take draws from this posterior distribution (Honaker, King, & Blackwell, 2011). In this study BEM algorithm was performed by R based AMELIA II package version 1.7.2. As this program allows to simultaneously impute more than one variables without discrimination between dependent and independent variables, all biochemical variables were included in the imputation process too. Imputed data set was saved in Stata 11 format for subsequent analysis, 2) multiple imputation by chained equation (MICE), also known as switching regression or sequential regression multivariate imputation (Kennickell, 1991), is based on fully conditional specification models. It doesn't depend on multivariate normality assumption (Raghunathan, Lepkowski, Van Hoewyk, & Solenberger, 2001) and can impute any type of variables (White, Royston, & Wood, 2011). Here we used Stata implementation of this procedure through ice command with the default number of 10 cycles (StataCorp. 2009. Stata 11 Multiple-Imputation Reference Manual. College Station). Like BEM, biochemical variables were included in the imputation process. Information on how to install and use MICE program can be found elsewhere , 3) multivariate normal regression (MVN) with which missing values are imputed by taking values from posterior distribution estimated by Markov Chain Monte Carlo algorithm (Lee & Carlin, 2010; Joseph L. Schafer, 1997). It is based on the multivariate normality assumption but will give valid results even when this assumption, like this study, is violated (Lee & Carlin, 2010;Joseph L. Schafer, 1997). With MVN it is possible to define several response variables but predictors should be completely observed. For MVN and two next methods we used MI system implemented in Stata version 11.2 (StataCorp. 2009. Stata 11 Multiple-Imputation Reference Manual. College Station) and biochemical variables were excluded from imputation process, 4) univariate linear regression (UVR) which uses a posterior predictive distribution derived from a normal linear univariate regression model. The imputation model comprises an imputing variable as a response and a number of predictors with no missing value (StataCorp. 2009. Stata 11 Multiple-Imputation Reference Manual. College Station), and 5) predictive mean matching (PMM) that is a semi-parametric version of the UVR with the same procedure but instead of replacing missing values with what the model predicts, PMM takes draws randomly from a set of observed values whose predictive means are in a pre-specified distance of that of missing values (StataCorp. 2009. Stata 11 Multiple-Imputation Reference Manual. College Station).
Although some experts advised that unless nonresponse rate is not unusually high more than five to ten imputation has no additional gain in efficiency (Rubin, 1987;J. L. Schafer, 1999), in complex surveys more number of imputation is encouraged to yield valid estimates (Wang & Robins, 1998). So the number of imputation was set to 20, as suggested by Graham (2009).
To make the heavily positively skewed FBS consistent with normality assumption natural logarithm (Ln) and Box-Cox (BC) transformations were used prior to imputation. For convenience we used O, Ln, or BC plus the abbreviation for MI methods to refer to the method and the scale of FBS. For example O-PMM refers to predictive mean matching method applied to original scale of FBS.

Variables Used in MI Models
As it is suggested variables correlated with participation or FBS's values (Horton & Lipsitz, 2001) besides variables associated to sampling and weights (Kim, Michael Brick, Fuller, & Kalton, 2006;Schenker et al., 2006) were used in MI models.
To assess which variable were related to participation (Davey, Shanahan, & Schafer, 2001)  variable was made which took 1 for those who were missing for all biochemical measurements and zero for those who had at least one of them observed. Then we used this variable as an outcome in a logistic regression model and all other variables as predictors. Variables with P value less than 0.2 were selected for MI model.
To find out which variables were related to FBS's values, we fitted a linear regression for FBS on all other variables in the data set. Then P value less than 0.2 was used as a criterion for inclusion in MI model. Survey setting was regarded in both logistic and linear regression modes (Schenker et al., 2006).
Totally 23 variables were selected for imputation of FBS as three nominal variables for province, fastening seat belt and wearing crash helmet habits; eight binary variables for residential area, sex, physical activity, currently use of tobacco, history of diabetes in person and their family, taking anti-diabetic medication (two variables); twelve continuous variables for age, sedentary life style, height, weight, waist circumference, systolic and diastolic blood pressure, total cholesterol, HDL cholesterol, triglyceride; and two variables for weights and sampling unit.
Twelve of these variables contained some amounts of missing values. Although, except for three biochemical variables, the most amount of missing data was 0.15%, we couldn't use them in univariate imputation models.
To not having to set aside these variables with few amount of missing data, we used single imputation to impute them prior to imputation of FBS. For this purpose MICE method was used and all variables except for biochemical measurements that had large amount of missing data were included in the imputation models. The imputed data set was used for imputation of FBS.

Statistical Analysis
Statistical analysis falls into two categories: first, measurements related to FBS from which we intended to remove nonresponse bias including: 1) Mean of fasting blood sugar, 2) proportion with FBS equal to or greater than 110 mg/dl but less than 126 mg/dl called impaired fasting sugar (IFG), 3) proportion with FBS equal to or greater than 126 mg/dl called high blood glucose (HBG). Those taking anti-diabetic medication were excluded from calculations of mean.
All calculations were performed using Stata's "mi estimate: svy linearized" command in which analysis and combination steps were integrated and survey setting is taken in to account. Table 1 through Table 3 display point estimates with 95% CI, estimated standard error, RE, FMI, and RVI for mean of FBS, prevalence of IFG and prevalence of HBG respectively for available case analysis (AC), adjusting weights (AW), and imputation strategies. The results from AW and AC were almost the same for these measurements. Table 1 shows that using MI and transformations didn't significantly change the mean of FBS. While BC-MVN had the highest relative efficiency and lowest fraction of missing information and relative variance increase, O-MICE had the lowest relative efficiency and highest fraction of missing information and relative variance increase.

Results
The ratio of standard error before imputation to standard error after it is another important metric that is expected to be more than one meaning that use of MI results in smaller standard error. This ratio exceeded one only for BC-MVN, BC-BEM, BC-UVR, and Ln-UVR.  Table 2 indicates that prevalence of IFG became significantly higher after applying all imputation methods except for PMM which gave similar results to those of AC. Estimation of prevalence of IFG was highly influenced by transformations as using BC transformation resulted in a reduction in it for all imputation methods and using natural logarithm did so for just UVR. Regarding RE, FMI, and RVI; Ln-PMM and Ln-MICE had the best and worst performance respectively. All MI models had the ratio of standard error before imputation to standard error after it less than one that was not a desired quality.   Table 3 shows that all MI methods, except for PMM, tended to increase the prevalence of HBG. The impact of transformation was more noticeable here as both transformations caused a decrease in the results. O-PMM had the highest relative efficiency and lowest fraction of missing information and relative variance increase while O-BEM had the lowest relative efficiency and highest fraction of missing information and relative variance increase. LN-UVR, Ln-PMM and Box-Cox transformation with all MI models had the ratio of standard error before imputation to standard error after it equal to one.

Discussion
In this study we applied five imputation strategies across two transformations to impute variable FBS in NCD's risk factor survey data in order to correct for potential nonresponse bias in estimation of FBS-related measurements. The choice of MI methods was based on type and distribution of imputing variable, subsequent analyses, and software considerations.
The results from MI's were compared with those of available case analysis and adjusting weights methods which were of similar results as was the case in another study that had the same approach for adjusting weight as we did. They reasoned that this similarity is because of the fact that few variables were used to define strata (Schenker et al., 2006).
There was also no significant difference among the results of different imputation models that was in line with other works (Lee & Carlin, 2010;Lin, 2010;Taylor et al., 2002). Surprisingly we also found no significant difference in the mean of FBS after applying imputation models. Considering the fact that there were some variables related to both participation and FBS's values we would expect that the study results could have suffered from a kind of selection bias. The justification could be that the selection bias in FBS could have been compensated by the presence of some variables related to participation in the same direction and related to FBS in the opposite directions and vice versa. So if there was one variable that caused an increase in the mean of FBS by increasing nonresponse probability there was another variable that caused a decrease in the mean of FBS by increasing nonresponse probability.
Considering prevalence of IFG and HBG, we saw a similarity between PMM and AC which can be explained by PMM strategy for replacing missing values which produces a distribution similar to that of observed values on the part of missing data. Other MI methods tended to increase regarding measurements. Performances of MI's in estimations of prevalence of IFG and HBG were highly influenced by transformations as they, especially Box-Cox, pull the right tail of distribution towards the mean and therefore the imputed values are placed nearer to the mean rather than far from it. This is why the estimated prevalence of HBG is more affected by transformation because, in comparison to IFG, further part of the distribution contributes to it. So as it is said that PMM is a reliable approach when data is heavily skewed and measurement of interest is proportion (Lee & Carlin, 2010) we concluded that there was no serious bias in the estimation of prevalence of IFG and HBG in the regarding data set. However PMM method on Ln transformed FBS produced more efficient estimates.
The ratio of standard error before imputation to standard error after it was less than one for some models meaning that they tended to increase the variance. The large variance of imputed FBS could be explained by low correlation among variables used in MI models. It has been shown that variance of MI will be higher when the correlation among variables is less than 0.28 and there is a high rate of missing data ( Regarding quantities used for evaluation of imputation models, BC-MVN was among top five imputation models for all measurements implying that MVN on Box-Cox transformed data outperformed other approaches (Lee & Carlin, 2010). Besides FMI was substantially less than nonresponse rate in case of BC-MVN indicating this approach was highly predictive (Lewis et al., 2014). The superior performance of MVN might not be attributed to more number of variables in MI models since the same variables were used in MICE too. In addition it is shown that the number of variables has little effect on the performance of MI especially when the correlation among them is low (Lin, 2010) like what we saw in this study.
The advantage of present study was the use of multiple imputation in STEPS data in order to remove the serious doubt which nonresponses would have cast on FBS-related measurements. We applied a wide range of methods to assess the sensitivity of results to the imputation strategy.
The main limitation of present study was that true values of missing information were unknown so we couldn't assess the accuracy of MI models and as a results the finding of present study couldn't be generalized to other biochemical measurements or surveys of other years. There also was no information of unit nonresponse (those who refused to take part in the whole survey) rate and the argument that all selected people has agreed to take part in the first two steps because information was collected by interview needs to be put in perspective as it was seen in Flint Men's Health Study that response rate was 86% for interview step (Taylor et al., 2002). We suggest that at least some useful information be recorded for nonresponses like sex, approximate age, district, and so on.
We also had no additional information about item nonresponses (those who refused to take part in a part of survey) like the number of attempts made for invitation. In a study that used MI to correct for nonresponse bias the number of invitations was incorporated in MI process to improve the results (Kmetic, Joseph, Berger, & Tenenhouse, 2002).

Conclusion
Application of multiple imputation methods didn't change the FBS-related finding of NCD risk factors survey conducted in Iran in 2007 so it is concluded that nonresponse bias was not an important challenge regarding these measurements. However the benefit of applying MI method, especially MVN, in this setting was reaching more efficient estimations by retrieving some of missing information. Further studies are needed to assess the accuracy of MI approaches and also on how to improve MI performance in similar settings.
Health to those who bothered the demanding task of data collection under an unbelievably difficult condition. There was neither conflict of interest nor ethical problem regarding the present study. The data set we received contained no confidential information.