A New Method for Logistic Model Assessment

It is well known that the logistic model plays an important role for the analysis of binary outcomes. Most of the existing methods for the assessment of logistic models are constructed based on the distance between the observed and the predicted outcomes. We consider a new method from a different perspective by assessing the distance between two consistent estimators developed under the same logistic model form. The proposed tests are easy to implement and are applicable to both prospective and case-control studies.


Introduction
The logistic regression model has become one of the most widely used statistical tools in the analysis of binary outcome data arising from either prospective studies or case-control studies (e.g., Prentice & Pyke, 1979;Breslow & Day, 1980;Hosmer, Lemeshow, & Sturdivant, 2013).Let X be a p × 1 vector of covariates and Y be a binary outcome taking value 1 or 0, where p is a positive integer representing the dimension of the covariates.The logistic model specifies the following relationship between Y and X: where β 0 is a scalar parameter and β is a p × 1 vector of parameters.The interpretation of β is the vector of log odds ratios of outcome Y associated with one unit increase in the vector of covariates X.The parameter β 0 can be estimated for prospective studies, but is not estimable in case-control studies (Prentice & Pyke, 1979;Yi, 2017, Ch.7).The inference on β enables investigators to quantify the effects of covariates X on the outcome Y.
The estimation results and interpretations obtained from model (1) can be misleading when model (1) does not reflect the true relationship between X and Y.For example, if the working model (1) fails to include important risk factors for the outcome, or the true link function connecting X and Y is not logit, fitting (1) to the data often leads to invalid inferential results.Therefore, it is essential to assess the appropriateness of the working model (1) before making conclusion based on the model fitting.
There has been extensive research on developing reliable tests for the assessment of logistic models; see Ch.5 of Hosmer et al. (2013) for a comprehensive review.It is common to assess a working model by examining whether the model fits the data well.Most existing tests assess the model by examining summary measures of the distance between the observed and the predicted outcomes.For example, Hosmer and Lemeshow (1980) proposed the so-called Hosmer-Lemeshow goodness-of-fit test based on grouping estimated probabilities.Tsiatis (1980) discussed a score test using the partition of covariates.Osius and Rojek (1992) considered a large sample normal approximation to the Pearson chi-square statistic.Su and Wei (1991) developed a test using cumulative sums of residuals.le Cessie andvan Houwelingen (1991, 1995) constructed tests using smoothed residuals.Stukel (1988) proposed a generalized logistic model by introducing additional parameters for an expanded class of models.A test was constructed based on the hypothesis that the additional parameters are equal to zero to shrink the class of models to the desired logistic model.In the context of case-control studies, Qin and Zhang (1997) pointed out that model ( 1) is equivalent to a two-sample semiparametric model and proposed a goodnessof-fit test by comparing the observed distribution of covariates and the expected counterpart under the assumed model.
In this paper, we propose a new method for the assessment of logistic models.The idea stems from a perspective different from existing methods.We construct the tests by comparing the maximum likelihood estimator for the parameter in (1) and a consistent estimator obtained from a weighted estimating equation, where the weight is a function of covariates and parameters.The intuition behind this approach is as follows.Since the two estimators are both consistent under the same assumption of model ( 1), a significant distance between them implies that the assumed model (1) may not hold.Our idea is relevant to Shih (1998) who developed a goodness-of-fit test for the Clayton copula model based on the difference between the unweighted and weighted estimators for the association parameter with the weight depending on the observed data only.In comparison, the weight in our paper also depends on the parameter and is attached to the estimating equation rather than the estimator.
Many existing methods are based on grouping of covariates, or estimated probabilities, which may be unideal in some cases.For instance, it has been documented that test statistics based on grouping can be sensitive to the choice of groups, i.e., test results obtained from different grouping plans can be inconsistent (e.g., D. Hosmer, T. Hosmer, le Cessie, & Lemeshow, 1997).To overcome this shortcoming, we propose tests that are free of grouping, conceptually straightforward and easy to implement.Our methods are applicable to handle both prospective data and case-control data.In our simulations, the proposed tests present similar performance to the widely used Hosmer-Lemeshow (HL) test.In some scenarios, the proposed tests achieve a higher power than the HL test.
The rest of this paper is organized as follows.Section 2 proposes the new assessment method for the logistic regression model, presents theoretical results and describes the test procedures.Section 3 performs simulation studies to assess the performance of the proposed tests in comparison with the HL test.Section 4 illustrates the proposed tests using two examples.Section 5 closes the paper with a discussion.

The Proposed Tests
Shih (1998) constructed a simple test to assess the goodness-of-fit for the Clayton copula model by examining the difference between the unweighted and weighted estimators for the association parameter.This test is justified by the fact that if the Clayton copula model holds, then the resulting two estimators should be consistent, thus yielding that the difference converges to zero as sample size goes to infinity.Adapting this scheme, we develop new tests for the assessment of the logistic model.

Unweighted and Weighted Estimating Equations
Suppose we observe a sample of n subjects.Let X i and Y i denote the observed covariates and outcome, respectively, for subject i, where Y i is a binary outcome taking value 1 or 0. Suppose the relationship between X i and Y i is characterized by model ( 1).The maximum likelihood estimator ( β0 , βT ) T of (β 0 , β T ) T is obtained by solving the likelihood score equations for (β 0 , β T ) T .It is known that the maximum likelihood estimator ( β0 , βT ) T is a consistent, asymptotically normal and efficient estimator for (β 0 , β T ) T when model ( 1) is true and certain regularity conditions (e.g., Lehmann & Casella, 1998, Ch.6) hold.Huang andWang (1999, 2001) considered the weighted version of estimating equations (2): where w(β 0 + β T X i ) is a weight function which depends on the covariates and the model parameter.Given a positive weight function w(•), a consistent estimator of (β 0 , β T ) T can be obtained by solving (3) for (β 0 , β T ) T by the fact that the expectation of the left hand side of (3) is zero (e.g., Yi, 2017, Ch.2).In particular, Huang and Wang (2001) chose weight functions and respectively, obtained the following estimating equations: and Let ( β0− , βT − ) T be the resulting estimator of (β 0 , β T ) T obtained by solving (4) for (β 0 , β T ) T , and ( β0+ , βT + ) T be the resulting estimator of (β 0 , β T ) T obtained by solving (5) for (β 0 , β T ) T .Under the assumed logistic model (1) and regularity conditions C1 and C2 of Huang and Wang (2001), both ( β0− , βT − ) T and ( β0+ , βT + ) T are consistent and asymptotically normal estimators of (β 0 , β T ) T (Huang & Wang, 2001).

The Proposed Test Statistics
We comment that the consistency of ( β0 , βT ) T , ( β0− , βT − ) T and ( β0+ , βT + ) T is basically ensured by the unbiasedness of the corresponding estimating functions in (2), ( 4) and (5), i.e., these estimating functions have a zero expectation (e.g., Yi, 2017, Ch.2).The zero mean property of the estimating functions in (2) and ( 3) is guaranteed because the conditional mean of Y i given X i is correctly specified as which is described by model (1).The involvement of the terms (1, X i ) T and w(β 0 + β T X i ) in ( 2) and (3) do not alter the zero mean property of the estimating functions in (2) and (3).
The consistency of three estimators ( β0 , βT ) T , ( β0− , βT − ) T and ( β0+ , βT + ) T requires that model (1) holds.When the logistic model (1) does not reflect the truth, we may expect to observe significant difference between the maximum likelihood estimator ( β0 , βT ) T and the weighted estimating equation based estimator ( β0− , βT − ) T , or significant difference between ( β0 , βT ) T and ( β0+ , βT + ) T .In practice, the estimation of parameter β is often of interest since β provides the odds ratio interpretations.Moreover, for case-control studies, the scalar parameter β 0 , which is related to P(Y = 1) for the population of interest, cannot be estimated.Prentice and Pyke (1979) showed that valid odds ratio estimators can be obtained by fitting logistic model (1) to the case-control data as if the data were collected from a prospective study.Therefore, here we focus on the log odds ratio parameters β and construct two tests based on the distance between β and β− , and the distance between β and β+ , respectively.
Under (1) and suitable regularity conditions, β, β− and β+ are consistent and asymptotically normal estimators of β.Thus, , where V − and V + are positive definite covariance matrices.The analytic forms of V − and V + can be obtained based on the sandwich type formula (Stefanski & Boos, 2002), and the estimates of V − and V + are available based on the empirical estimates of the corresponding sandwich terms.
The following Theorem reports the two proposed tests.
Theorem: Suppose the logistic model ( 1) and suitable regularity conditions hold.
Proof.Since V − and V + are positive definite, they are invertible with rank p.Let V −1 − be the inverse matrix of V − and let V −1 + be the inverse matrix of V + .By the theory of quadratic forms (e.g., Rao, 1973, Ch.3), Since n Σ− is a consistent estimator of V − and n Σ+ is a consistent estimator of V + , the results in the theorem are immediate.
Remark 1: When the singularity problem of Σ− or Σ+ occurs, we describe two strategies.The first strategy is to simply consider subvectors of β − β− and β − β+ such that the resulting covariance matrices are nonsingular.Then the degrees of freedom in (a) and (b) should be modified to be the length of the subvectors.The second strategy is to use the generalized inverse matrix (Rao, 1973, Ch.1) to get around the singularity.The resulting test is still valid, with the degrees of freedom in (a) and (b) replaced by the rank of V − and V + , respectively.The use of generalized inverse matrix for handling singularity was considered by Lin and Wei (1991) in the context of information matrix tests for Cox regression models.
Remark 2: Since the derivation of the analytic forms of the estimators Σ− and Σ+ needs the second derivatives of the likelihood, and the derivatives of the estimating equations, the implementation is complicated.In real applications, we use nonparametric bootstrap method (Efron, 1982) to obtain these two covariance matrices Σ− and Σ+ for convenience.

Test Procedures
The theorem proposes two tests for assessing the model form (1). We now further elaborate on the detail of the test procedure by using the test statistic in Theorem (a); the test based on Theorem (b) can be conducted in the same manner.
Step 1: Obtain β and β− by solving estimating equations ( 2) and ( 4), respectively.In particular, β can be easily obtained using available software packages for logistic models.
Step 2: Obtain Σ− using the nonparametric bootstrap method (Efron, 1982).Specifically, we generate B bootstrap samples by resampling the original data at the individual level with replacement, where B is a user-specified positive integer.
Step 3: Calculate the observed value for the test statistic The observed p-value is obtained as P(χ 2 p > T 1 ), where χ 2 p represents a random variable following a χ 2 distribution with degree of freedom p.For a given significance level α, if the observed p-value is less than α, the test suggests that logistic model (1) may not be appropriate, and the null hypothesis that the model (1) holds is then in doubt.

Simulation Studies
We conduct simulation studies to assess the finite sample performance of the proposed tests.The widely used HL test (Hosmer & Lemeshow, 1980) is included in the simulation studies as the benchmark.The HL test is grouping-based method and the number of groups is user-specified.We adopt the commonly used strategy and take 10 groups in our simulations.We use 1000 bootstrap replicates for the proposed tests to estimate the covariance matrices.The sample size is set to be 500 or 1000.For each scenario, 1000 replications are run.For the ith subject, let covariates X i = (X 1i , X 2i ) T where X 1i is drawn from a standard normal distribution, and X 2i is generated from a uniform distribution ranging from 0 to 1.The outcome is generated from the model: where γ is a parameter for the extra quadratic and interaction terms.The working model we use to fit the simulated data is from (1), logit where no extra quadratic and interaction terms are included.

Assessment of Test Level
When γ = 0 in ( 6), the logistic model form (1) correctly reflects the truth.For this case, we examine the estimated significance levels (i.e., the probability of rejecting model (1) when model ( 1) is really true) of the proposed tests and the HL test, and report the test results in Table 1 for the significance level α = 0.01, 0.05 and 0.1.The estimated significance levels for all three tests are reasonably close to the nominal significance levels.

Assessment of Test Power
When γ 0 in (6), model (1) does not reflect the truth.By setting γ = −0.2 or −0.5, we examine the estimated powers (i.e., the probability of rejecting model (1) when model ( 1) is really not true) of the proposed tests and the HL test at the significance level α = 0.05.Table 2 summarizes the empirical results.The estimated powers are significantly larger with γ = −0.5 than those with γ = −0.2, because of the farther departure from the null model.The first proposed test performs similar to the HL test.The second proposed test achieves a significantly higher power than the other two tests when γ = −0.2.As anticipated, the achieved power gets larger as the sample size increases.Now we assess the power of the three methods when the true link function is not logit.To simulate data, we modify (6) by specifying γ = 0 and replacing the logit link by one of the two link functions: probit and complementary log-log (Clog-log).We still use model ( 1) to fit the simulated data.Table 3 summarizes the estimated powers of the three tests at the significance level α = 0.05.Compared with the complementary log-log link, testing the probit link leads to lower powers as expected, since the probit model is close to the logistic model, and the difference is hard to determine (e.g., Dobson, 2002, Section 7.3).When n = 500, the first proposed test and the HL test show similar performance, with higher achieved powers than the second proposed test.Interestingly, when n = 1000, the second proposed test shows the highest achieved powers.The estimated powers increase as the sample size increases.

Table 1 .
Estimated significance levels for the Hosmer-Lemeshow test and proposed tests at the significance level α = 0.01, 0.05, and 0.1.

Table 2 .
Estimated powers for the Hosmer-Lemeshow test and proposed tests at the significance level α = 0.05 when the working model lacks quadratic and interaction terms.

Table 3 .
Hosmer et al. (2013) the Hosmer-Lemeshow test and proposed tests at the significance level α = 0.05 when the working model misspecifies the link function.We use two examples to illustrate the proposed tests.The HL test is conducted as the benchmark, where we adopt the commonly used 10 groups.One thousand bootstrap replicates are used for the proposed tests.Example 1.Hosmer et al. (2013)considered a dataset of 100 subjects with age regarded as the covariate and the presence or absence of coronary heart disease taken as the binary outcome.It is of interest to assess model (1) which is used to feature the relationship between the status of coronary heart disease and age.We apply the two proposed tests and the HL test to assess model (1).The observed test statistic for the first proposed test is 0.010 with 1 degree of freedom, and the observed p-value equals 0.920.The observed test statistic for the second proposed test is 0.002 with 1 degree of freedom, and the observed p-value equals to 0.969.The HL test gives the observed test statistic 2.224 with 8 degrees of freedom, and the observed p-value equals 0.973.The three tests produce fairly similar results of p-value and they all suggest no evidence against the use of logistic model (1).Example 2.Hosmer et al. (2013)discussed a low birth weight study.The dataset includes 189 births to women observed in the obstetrics clinic.The binary outcome is the indicator of low birth weight (< 2500 grams).Consider weight of mother at last menstrual period, age of mother and smoking status during pregnancy as covariates.Our goal is to assess the appropriateness of model (1).The observed test statistic for the first proposed test is 1.314 with 3 degrees of freedom, yielding the observed p-value 0.726.The observed test statistic for the second proposed test is 1.913 with 3 degrees of freedom, giving the observed p-value 0.591.The HL test gives the observed test statistic 7.225 with 8 degrees of freedom,