Asymptotic Efficiency of an Exponential Cure Model When Cure Information Is Partially Known

Cure models are popularly used to analyze failure time data where some individuals could eventually experience and others might never experience an event of interest. However in many studies, there are diagnostic procedures available to provide further information about whether a subject is cured. Wu et al. (2014) proposed a method, called the extended cure model, that incorporated such additional diagnostic cured status information into the classical cure model analysis. Through extensive simulations, they demonstrated that the extended cure models provide more efficient and less biased estimations, and higher efficiency and smaller bias are associated with higher sensitivity and specificity of the diagnostic procedure used. In this paper, we provide theoretical justifications of this positive association for some special cases. More specifically we shows that the maximum likelihood estimators (MLEs) of the parameters for an extended exponential cure model are asymptotically more efficient than the MLEs for the corresponding classical exponential cure model.


Introduction
When there is evidence of long-term survivors, cure models are often used to model the survival curve.Let T be a non-negative random variable for the failure time, x and z the covariate vectors, π(z) the uncured probability for a subject, and f (t|x, z) and S (t|x, z) the probability density function (pdf) and the survival function for T , respectively.Denote f u (t|x) and S u (t|x) as the pdf and the survival function for uncured subjects, respectively.The cure model can be written as a mixture model in terms of the pdf: f (t|x, z) = π(z) f u (t|x), or in terms of the survival function: (1) In the literature, the cure models have been extensively studied.Conventionally π(z) is called the "incidence" part, and f u (t|x) is referred to as the "latency" part.Logistic regression is commonly used to model the "incidence" part, although other links or non-linear regression methods could be used.The "latency" part can be modeled parametrically, semi-parametrically, or non-parametrically.In the parametric approach, the following distributions have been commonly used: Exponential (Jones et al., 1981;Goldman, 1984;Ghitany & Maller, 1992); Weibull (Farewell, 1982(Farewell, , 1986)); Lognormal (Boag, 1949;Gamel et al., 1990); Gompertz (Gordon, 1990a(Gordon, , 1990b;;Cantor & Shuster, 1992); Extended generalized gamma (EGG) (Yamaguchi, 1992); and Generalized F (GF) distributions (Peng et al., 1998).In the non-parametric approach, Kaplan-Meier estimation method is used without adjusting for covariates as in Taylor (1995).In the semi-parametric approach, some authors used the Cox proportional hazards (PH) model (Kuk & Chen, 1992;Peng & Dear, 2000;Sy & Taylor, 2000), and some used accelerated failure time (AFT) models (Li & Taylor, 2002;Zhang & Peng, 2007).In general, parametric cure models can achieve greatest efficiency in estimation if the distributional assumptions are satisfied.However in practice it can be challenging to verify these assumptions.Although semi-parametric models do not require a distributional assumption, they may lose efficiency in estimation compared to a parametric model when a distribution can be correctly identified.
All the cure modeling to date assumes that cured and uncured subjects can not be distinguished in the censored subset.However medical diagnostic procedures in many studies are available to provide further information about whether a subject is cured.For instance, closure of the growth plate can be served as an indicator of cure in the study of bone injury in pediatric patients (Leary et al., 2009;Wu et al., 2014).The diagnostic procedures are likely associated with a certain degree of accuracy in terms of sensitivity and specificity, because it can be difficult to completely separate cured and uncured subjects in the censored subset.Motivated by a clinical study, Wu et al. (2014) extended the classical cure models to incorporate the additional diagnostic information about cured status.
Through extensive simulations, they demonstrated that the extended cure models provide more efficient and less biased estimations, and the higher efficiency and smaller bias is associated with higher sensitivity and specificity of diagnostic procedures.
In this paper, we provide theoretical justifications to show how such additional diagnostic information can improve the asymptotic efficiency of model parameter estimators, as compared to the classical cure model approach.Specifically, we provide theoretical justification of this positive association between the sensitivity and specificity of the diagnostic procedure and the asymptotic efficiency of the maximum likelihood estimators (MLEs) of the extended exponential cure model of Wu et al. (2014) in a few special cases.
In Section 2, the formulation of a cure model incorporated with additional cure information (called extended cure model) is provided.In Section 3, the asymptotic efficiency of the MLEs of the parameters for an extended exponential cure model and the asymptotic relative efficiency (ARE) of the MLEs respect to the MLEs for the traditional exponential cure model are systematically studied under some special cases.Discussion is given in Section 4.

Extended Cure Models
Extended cure models have been introduced by Wu et al. (2014).
. ., n} be a data set.Here t i is the observed survival time of subject i, δ i is the censoring indicator with 1 if t i is uncensored (i.e., observed), and 0 otherwise, x i and z i are two covariate vectors.Let β and γ be the parameter vectors related to x i and z i , respectively, and θ 1 = (β , γ ).If the cure model in ( 1) is used for modeling the data set O 1 , the observed likelihood can be written as: Assume that for censored subjects, their diagnostic results d i are also observed, where d i is 1 if subject i is diagnosed as cured and 0 if diagnosed as uncured.A diagnostic procedure usually is associated with certain sensitivity and specificity.Sensitivity measures the proportion of actual positives which are correctly identified (e.g., the percentage of sick people who are correctly identified as sick).Specificity measures the proportion of actual negatives who are correctly identified (e.g., the percentage of healthy people who are correctly identified as healthy).Suppose that the diagnostic procedure results are independent of the failure times, i.e., d i is independent of t i , and the diagnostic procedure has a sensitivity of p 0 and a specificity of 1 − p 1 .We have p 0 ≥ p 1 for a validated diagnostic procedure.Although p 0 and p 1 might be modeled, for simplicity they are assumed not to depend on any covariates.
. ., n} and θ 2 = (θ 1 , p 0 , p 1 ).For uncensored individuals (δ i = 1), the contribution to the likelihood is the same as that in (2); while for censored individuals (δ i = 0), with the independent assumption of d i and t i , the contribution is if they are uncured, and the contribution is if they are cured.A cure model incorporated with these additional diagnostic information will be called an extended cure model.The observed likelihood for the extended cure model is as follows: Because the diagnostic procedure results may not always be available for all the censored subjects, let η i = 1 if the diagnostic result of subject i is available, and We can then write the observed likelihood for the extended cure model when cure information is partially known as follows: It is noted that (4) reduces to (2) except for a constant multiplier when p 0 = p 1 , which means that if both sensitivity and 1 − specificity are the same, the likelihood functions with and without the diagnostic information are the same.In practice, we want both sensitivity and specificity to be high and p 0 p 1 .
As in the literature, one can use logistic regression, other link functions or nonlinear regression to model the "incidence" part π(z).Parametric, semiparametric (PH or AFT), or nonparametric methods can be used to model the "latency" part S u (t|x).An expectation-maximization (EM) algorithm can be used to estimate the model parameters in (4).The details of the EM procedure can be found in Wu et al. (2014).In this paper, we focus on the asymptotic efficiency of the MLEs of the parameters in the extended exponential cure model with the observed likelihood in Equation (3).

Asymptotic Efficiency of Maximum Likelihood Estimation for Extended Exponential Cure Models
In this section, we show for several special cases that the asymptotic efficiencies of the MLEs for an extended exponential cure model are positively associated with the sensitivity and the specificity of the diagnostic procedure, and are asymptotically more efficient than the MLEs for the corresponding classical cure model.Assume that the logit link is used for the incidence part, the exponential distribution for the latency part, and p 0 and p 1 are known.Specifically, the assumptions are stated as follows: ) parameter vector, and • p 0 and p 1 are known with p 0 ≥ p 1 for a valid diagnostic procedure.
Proposition 1 Denote V D γ as the asymptotic variance of the MLE of γ when the diagnostic procedure is used, and V N γ as the asymptotic variance of the MLE of γ when no diagnostic procedure is used.Let V D β be the asymptotic variance of the MLE of β when the diagnostic procedure is used, and V N β the asymptotic variance of the MLE of β when no diagnostic procedure is used.The following results are true: (1) When sensitivity and specificity are both 100%, i.e., p 0 = 1, p 1 = 0, all diagonal entries of V D γ and V D β are less than or equal to the corresponding entries of V N γ and V N β .This implies that the estimators of γ and β are more efficient when diagnostic information is included.
(2) When k = 0, m = 0, i.e., γ = (γ 0 ), β = (β 0 ), V D γ and V D β are less than or equal to V N γ and V N β , respectively.This implies that the estimators of γ and β are more efficient when diagnostic information is included.Furthermore, the asymptotic variance decreases as the sensitivity or specificity increases.
(3) When k = 0, m = 1, i.e., γ = (γ 0 ), β = (β 0 , β 1 ) , and x i1 is a binary variable with values of 0 and 1, the asymptotic variances of the MLEs of γ 0 and β 0 are smaller when the diagnostic procedure is used.This implies that the estimators of γ 0 and β 0 are more efficient when diagnostic information is included.Furthermore, the asymptotic variance decreases as the sensitivity or specificity increases.
(4) When k = 1, m = 0, i.e., γ = (γ 0 , γ 1 ) , β = (β 0 ), and z i1 is a binary variable with values of 0 and 1, the asymptotic variances of the MLEs of γ 0 and β 0 are smaller when the diagnostic procedure is used.This implies that the estimators of γ 0 and β 0 are more efficient when diagnostic information is included.Furthermore, the asymptotic variance decreases as the sensitivity or specificity increases.
The proposition will be proved based on several Lemmas.For convenience, for all the derivations in this section, denote π i = π(z i ) and h i = h(x i ).The observed likelihood for the extended exponential cure model according to (3) can be written as: which implies that the observed log-likelihood is: The score functions are: and By defining one can simplify ( 7) and ( 8) to The entries of the observed information matrix are For any γ m and γ n , and for observation i, because π i = e γ z i 1+e γ z i , the first order partial derivatives of π i are , and The second order partial derivatives of π i are and From ( 12) and ( 13), we have Similarly for any β m and β n , and for observation i, the first order partial derivatives of h i = e β x i are The second order partial derivatives of h i are From ( 14) and ( 15), we have Consequently, I 11 in ( 9) and I 22 in ( 10) can be rewritten as follows: and Similarly, if no diagnostic information is used, we only need to set v i = 1 or p 0 = p 1 = 0.5 in ( 16), ( 17), and (11) to have the following entries To obtain the information matrix, we will take expectation of I rs and J rs , r, s = 1, 2, with respect to O = {T, V}.Let We have the following results. and by plugging ( 20) and ( 21) into ( 19), we have Similarly, we have It can be shown from ( 22) and ( 23) that Lemma 3 Denote I (i) 11 and J (i) 11 as the i th summand of I 11 and J 11 , respectively.Then Proof.First of all, Δ (i) 11 can be expressed as follows: We can write the third term in the above expression as follows: .
By using the expressions of P(δ i = 0, d i = 0|t i ) and P(δ i = 0, d i = 1|t i ) in ( 20) and ( 21), respectively, we can simplify the third term as follows: The above expression does not depend on v i , so it turns out that It follows that Again because of ( 20) and ( 21), we have Lemma 4 Denote I (i) 22 and J (i) 22 as the i th summand of I 22 and J 22 , respectively.Then Proof.Δ (i) 22 can be written as follows: By using ( 20) and ( 21), we can write the third term in the above expression as follows: Because the above expression does not depend on v i , we have Therefore, it follows that Based on ( 20) and ( 21), we have Because the expressions of Δ (i) 12 , Δ (i) 11 , and Δ (i) 22 all involve ϕ i (p 0 , p 1 ), to prove Proposition 1, we need the following lemma regrading ϕ i (p 0 , p 1 ).

Lemma 5 For function
, if 0 ≤ p 1 ≤ p 0 ≤ 1, then for any i, ϕ i (p 0 , p 1 ) is an increasing function of p 0 , and a decreasing function of p 1 .
Proof.If holding p 0 fixed, we can rewrite ϕ i (p 0 , p 1 ) as .
Because p 0 ≥ p 1 , smaller p 1 leads to larger p 0 − p 1 , larger , and smaller . All these lead to a larger ϕ i (p 0 , p 1 ).If we hold p 1 as fixed, ϕ i (p 0 , p 1 ) can be rewritten as .
With the differences for each entry of the information matrix computed by Lemmas 2 -4, and the property of the differences established by Lemma 5, we are ready to prove Proposition 1.
For any (k + 1) dimensional vectors u and w, a non-negative constant c, and a (k + 1) × (k + 1) positive definite matrix A, we have By adding Δ (i) 11 one at a time, for any u, we have u V D γ u ≤ u V N γ u.By taking u i as u i j = 0 if j i and u i j = 1 if j = i, we can conclude that all the diagonal entries of V D γ are less than or equal to the corresponding diagonal entries of V N γ .Because smaller diagonal entries indicate higher efficiency, the estimator of γ with diagnostic information included is more efficient than that without diagnostic information included.
Similarly, it can be shown that all the diagonal entries of V D β are less than or equal to the corresponding diagonal entries of V N β and, hence, the estimate of β with diagnostic information included is more efficient than that without diagnostic information included.
Because E O (a (1) 11 + d (1) 11 ϕ 1 (p 0 , p 1 )) and E O (a (1)  22 + d (1) 22 ϕ 1 (p 0 , p 1 )) are increasing functions of p 0 , and decreasing functions of p 1 through their dependence on ϕ 1 (p 0 , p 1 ), and E 2 O (a (1)  12 − d (1) 12 ϕ 1 (p 0 , p 1 )) is a decreasing function of p 0 , and an increasing function of p 1 through its dependence on ϕ 1 (p 0 , p 1 ), V D γ −1 is an increasing function of p 0 , and a decreasing function of p 1 , i.e., an increasing function of p 0 (sensitivity) and 1 − p 1 (specificity).Larger Thus the efficiency of the estimator of γ increases as either specificity or sensitivity increases, and the estimator of γ with diagnostic information included is more efficient than that without diagnostic information included.
is also an increasing function of p 0 , and a decreasing function of p 1 , i.e., an increasing function of p 0 (sensitivity) and 1 − p 1 (specificity).Larger V D β −1 leads to smaller V D β .Therefore, the efficiency of the estimator of β increases as either specificity or sensitivity increases, and the estimator of β with diagnostic information included is more efficient than that without diagnostic information included.
Proof of Case 3. ∂π i ∂γ is the same for all subjects when γ = (γ 0 ), so we can denote it as an unknown constant C γ 0 .For β = (β 0 , β 1 ) and x i1 being a binary variable with values of 0 and 1, ∂h i ∂β can be expressed as follows: .
For j = 0, 1, let Assume that there are n 0 observations with x i1 = 0, and n 1 observations with x i1 = 1.By the independent and identically distributed (i.i.d.) property when the covariates are the same, we have and (31) From ( 29), (30), and (31), the inverse of V D γ is as follows: leads to smaller V D γ .Consequently, the efficiency of the estimator of γ increases as either specificity or sensitivity increases, and the estimator of γ with diagnostic information included is more efficient than that without diagnostic information included.i.i.d.property when the covariates are the same that which implies that Because and From ( 32), (33), and (34), we have c 111 , c 110 , c 220 , and c 221 are increasing functions of p 0 , and decreasing functions of p 1 through their dependence on ϕ i (p 0 , p 1 ), and c 121 and c 120 are decreasing functions of p 0 , and increasing functions of p 1 through their dependence is an increasing function of p 0 , and a decreasing function of p 1 , i.e., an increasing function of p 0 (sensitivity) and 1 − p 1 (specificity).Larger V D β −1 leads to smaller V D β .Consequently, the efficiency of the estimator of β increases as either specificity or sensitivity increases, and the estimator of β with diagnostic information included is more efficient than that without diagnostic information included. .
Because c 111 , c 110 , c 220 , and c 221 are increasing functions of p 0 , and decreasing functions of p 1 through their dependence on ϕ i (p 0 , p 1 ), and c 121 and c 120 are decreasing functions of p 0 , and increasing functions of p 1 through their dependence on ϕ i (p 0 , p 1 ), we can draw an inference as follows: is an increasing function of p 0 and a decreasing function of p 1 .It turns out that V D γ 0 is a decreasing function of p 0 and an increasing function of p 1 , i.e., a decreasing function of p 0 (sensitivity) and 1 − p 1 (specificity).Therefore, the efficiency of the estimator of γ 0 increases as either specificity or sensitivity increases.Because the estimate of γ 0 without diagnostic information included corresponds to the case where sensitivity is the same as 1 -specificity (p 0 = p 1 ), the estimator of γ 0 with diagnostic information included (with p 0 > p 1 ) is more efficient than that without diagnostic information included.

Summary and Discussion
An extended cure model incorporated with additional diagnostic information about cured status is very useful to model the failure time data where some individuals could eventually experience, and others never experience, the event of interest when their diagnostic information is available.In this paper, we have shown theoretically that the MLEs for the parameters in the extended exponential cure model are asymptotically more efficient than the MLEs for those in the classical exponential cure model.Specifically we showed for some special cases that the asymptotic efficiency increases as the sensitivity and the specificity of diagnostic procedures increase.In conclusion, based on the results provided in this paper, we highly recommend that when additional cure information is available, even only partially, we should incorporate this information into the model.It is also recommended that investigators should devise diagnostic procedures of cure and collect available cure information when we design and conduct studies.

Because b 111 ,
b 110 , b 220 , and b 221 are increasing functions of p 0 , and decreasing functions of p 1 through their dependence on ϕ i (p 0 , p 1 ), and b 121 and b 120 are decreasing functions of p 0 , and increasing functions of p 1 through their dependence on ϕ i (p 0 , p 1 ), V D γ −1 is an increasing function of p 0 , and a decreasing function of p 1 , i.e., an increasing function of p 0 (sensitivity) and 1 − p 1 (specificity).Larger V D γ −1