Gradient and Likelihood Ratio Tests in Cure Rate Models

In some survival studies part of the population may be no longer subject to the event of interest. The called cure rate models take this fact into account. They have been extensively studied for several authors who have proposed extensions and applications in real lifetime data. Classic large sample tests are usually considered in these applications, especially the likelihood ratio. Recently a new test called gradient test has been proposed. The gradient statistic shares the same asymptotic properties with the classic likelihood ratio and does not involve knowledge of the information matrix, which can be an advantage in survival models. Some simulation studies have been carried out to explore the behavior of the gradient test in finite samples and compare it with the classic tests in different models. However little is known about the properties of these large sample tests in finite sample for cure rate models. In this work we performed a simulation study based on the promotion time model with Weibull distribution, to assess the performance of likelihood ratio and gradient tests in finite samples. An application is presented to illustrate the results.


Introduction
Cure rate models have been extensively studied in the literature for data sets where the event of interest may not occur for part of the population studied.That is, part of the population studied will never experience the event of interest, being that recurrence of a disease, product consumption or many other situations.An early approach developed in Boag (1949) and Berkson and Gage (1952), considers a mixture of two distributions, one representing the survival time of the individuals in risk, and a degenerated one, allowing infinite time for some fraction of population considered cured.This model is known as the standard mixture model and the book by Maller and Zhou (1996) presents an up to date review of the main results on the subject.
Alternatively Yakovlev et al. (1993) and Chen et al. (1999) introduce a class of models involving a competitive risk type structure.Applications to cancer clinical trials have been specially successful.They are used for modeling time-to-event data for several types of cancer, including breast cancer, leukemia, prostate cancer and many others.Such models have been discussed in the statistical literature by many authors.In Tsodikov et al. (2003) this model is refered as bounded cumulative hazard model.They provided an overview of the development of this cure rate model from both the frequentist and Bayesian perspectives.In Yin and Ibrahim (2005) this model is refered as promotion time model.A unified approach that includes standard mixture model and promotion time model as special cases, is pursued in Rodrigues et al. (2009).
In the present paper, the interest lies in testing hypotheses.The commonly used large sample tests are based on the likelihood ratio statistics (Wilks, 1938), Wald (Wald, 1943) or Rao score (Rao, 1948) tests.Particularly, the likelihood ratio is usually considered in applications to test parameters in cure rate survival models.There are hardly any works about finite-sample performance of likelihood ratio under this models.We can mention just Sposto et al. (1992), which present a small (100 samples) and restrict simulation study to compared the likelihood ratio, Wald, and score tests based on a mixture model.Although it is known the liberal tendency of the likelihood ratio test when the sample is not large, they found that the tests keep their asymptotic properties even in small samples, for some specific situations.
Recently the gradient test was proposed in Terrell (2002).As well as the usual classical statistics, the gradient statistic has asymptotically chi-square distribution.This new statistic was obtained from Rao score and Wald modified statistics (Hayakawa and Puri, 1985).A comparison of local power properties of the gradient test with classical tests was studied in Lemonte and Ferrari (2012) and no uniform superiority was found.Some simulation studies have been conducted (Lemonte andFerrari, 2011a, 2012;Ferrari and Pinheiro, 2014) in order to explore the characteristics of this new statistic and compare the competing tests to different models.Because statistical gradient do not need the computation of the information matrix (neither observed nor expected), it may be advantageous in problems involving censored samples, which are often observed in survival studies.Among the studies that consider the statistical gradient in survival models with censored data, we can cite Lemonte and Ferrari (2011b), that consider samples with right censoring of type II to test hypotheses about the two parameters of the Birnbaum-Saunders distribution, and Medeiros et al. (2014) that compare the performance of the gradient and likelihood ratio tests in accelerated failure time models under random censoring.The book by (Lemonte, 2016) provides a broad survey about results of gradient test in literature.There are no studies involving the gradient test with cure fraction, with or without the presence of covariates.In this paper we study the performance of the likelihood ratio and the gradient tests via simulation study, to test coefficients related to cure rate parameter in the Weibull promotion time cure model.
The paper is organized as follows.The unified approach for cure rate model is described in detail in Section 2. Section 3 briefly describes the likelihood ratio and gradient tests, and presents the resulting tests to the model based on marginal likelihood obtained after eliminating the latent variables.Section 4 presents a simulation study.In Section 5 we illustrate our results with a real data set about the time to pediatric leukemia recurrence.Some conclusions are reported in Section 6 and some basic results are presented in an Appendix.

Cure Fraction Model: Unified Approach
Survival models with cure fraction are models that consider cured (or immune) a fraction of the population.The occurrence of a high percentage of censoring at the end of the study in a sufficient follow-up time is an indication of cure fraction in population (Maller and Zhou, 1996).Considering the unified model, suppose we have n individuals and that for each individual (i = 1, . . ., n) it is associated a (latent) random variable M i , representing the number of causes or risk factors competing for the occurrence of the event of interest, with probability function p θ (m) = P θ (M i = m).Given M i = m, suppose also that the random variables Z i1 , Z i2 , . . ., Z im , are independent and identically distributed (i.i.d.), representing (unobserved) time-to-event for the i-th individual, due to j-th cause ( j = 1, . . ., M i ), with common distribution function F(z|λ) and a survival function S (z|λ) = 1 − F(z|λ), where λ is a vector of parameters and lim t →∞ S (t|λ) = 0. Let T i be an observable random variable representing the time until the occurrence of the event, defined as T i = min{Z i0 , Z i1 , . . ., Z iM i } where the sequence Z i1 , Z i2 , . . .does not depend on M i .Besides, Z i0 is set so that P(Z i0 = ∞) = 1.This assumption permits the occurrence of an infinite lifetime in the immune individuals, because when M i = 0 there are no causes or risks for the occurrence of the event.
The common survival function for T i is given by since Hence S p (t) is an improper survival function, i.e., lim t →∞ S p (t) > 0. This survival function can be interpreted as an infinite linear combination of Lehmann type II distributions (Rodrigues et al., 2011;Alexandre et al., 2012).The proportion of cured individuals (cure fraction) is given by lim t →∞ S p (t) = p θ (0).From (1) we can obtain the sub-density function for the random variables T i as

Likelihood for Unified Model
Furthermore, consider that for i = 1, . . ., n, Y i = min{T i , C i } is the observable lifetime for individual i, where C i is right censoring time (random and uninformative) independent of T i , and let δ i be the censoring indicator, with . ., x ip ) ⊤ the vector of associated covariates.
Thus, the vector of unknown parameters in the model is given by ϕ = ( β ⊤ , λ ⊤ ) ⊤ and, to use a better notation, we consider p θ i (m i ) = p(m i |β, x i ).After some algebraic manipulations it can be shown that the likelihood for the complete data D c is given by (2) Note that the likelihood (2) is not observable, since it depends on the latent variables.In practice, a marginal likelihood is used.It is obtained by summing overall possible values for the variables M i , i = 1, . . ., n, and given below in (3).For details see Appendix.
Therefore, the logarithm of the marginal likelihood function is given for

Likelihood Ratio and Gradient Tests
Let ℓ(ϕ) = log L(ϕ) be a log-likelihood function of ϕ a p-vector of unkown parameter, and define , where the dimensions of ϕ 1 and ϕ 2 , are q and p − q respectively, we have a corresponding partition The likelihood (S LR ) and gradient (S G ) statistics for testing the composite hypothesis are respectively given by where ϕ 10 is a specified vector, ϕ = ( ϕ 1 , ϕ 2 ) ⊤ is the (unrestricted) maximum likelihood estimators of ϕ and φ = ( φ1 , φ2 ) ⊤ denote the (restricted) maximum likelihood estimators of ϕ under H 0 hypothesis.Asymptotically, S LR and S G have a central chi-square distribution with q degrees of freedom under H 0 , and general conditions of regularity.The null hypothesis is rejected for a fixed nominal level α, if the test statistic exceeds the upper 100(1 − α)% quantile of the chi-square distribution.

Tests for Weibull Promotion Time Model
When each random variable M i follows a Poisson distribution with parameter θ, the unified model comes down to promotion time model (Yakovlev et al., 1993;Chen et al., 1999).From (1) we have thus the cure fraction induced by this model is p θ (0) = exp (−θ), and the probability density function is At the promotion time Weibull model, it is assumed that failure times of susceptible individuals follow a Weibull distribution.Here we use the parametrization for Weibull given in Fonseca et al. (2011), where the probability density function and survival are given by where ρ > 0 and γ ∈ ℜ.Thus the functions S p (t) and f p (t) are given by and Now, consider the existence of heterogeneity in the population so that each random variable M i follows a Poisson distribution with parameter θ i .The relation often used between the parameter θ i and the covariates in the promotion time model is given by θ i = exp ) , where β and x i are defined as before.In this case the cure fraction is related to the covariates through the expression Considering a sample of n individuals and denoting by ϕ = ( β ⊤ , λ ⊤ ) ⊤ , the parameter vector where λ is the vector of parameter of Weibull model.The logarithm of the marginal likelihood function for this model is obtained substituting ( 6) and ( 7) in (4) and including covariates through the relation given above, then we have For computational reasons, we consider in (9) a reparametrization ρ = e ρ * , obtaining ρ * ∈ ℜ. Denoting λ ⊤ = (γ, ρ * ) and through the derivative with respect to the parameter vector ϕ, we get the score vector, which can be written as where X i is the matrix , ) y e ρ * i log(y i ) and ) y e ρ * i .
Now consider we want to test only a partition ϕ 1 with dimension q of the vector ϕ.The likelihood ratio and gradient statistics for testing the composite hypothesis, as in (5) and considering ϕ 10 = 0, are given by ) ⊤ are, respectively, the unrestricted and the restricted maximum likelihood estimator of ϕ = (ϕ 1 , ϕ 2 ) ⊤ , under the null hypothesis.

Simulation Study
A simulation study was conducted to investigate the finite sample performance of the likelihood ratio and gradient tests to test parameters for a survival model with cure fraction.The simulation results are based on R software (R Development Core Team, 2010) which use the routine optim to maximize the likelihood function through the optimization algorithms BFGS (Broyden, Fletcher, Goldfarb and Shanno).The model considered was the Weibull promotion time.Relating the reparametrization used in (10) for ρ = e ρ * , with the default in the software R for parameters of Weibull distribution we found ρ * = log(a) and γ = −a log(b), where a > 0 and b > 0 are shape and scale parameters, respectively.
To assess the effect of number of nuisance parameters in performance of tests, we consider cases with three, four and five covariates (p = 3, p = 4 and p = 5), generated from Bernoulli distributions.That is, for l = 1, . . ., p, each covariate x l is drawn from a Bernoulli (ν l ) where the probabilities of success ν 1 , . . ., ν 5 are set as 0.49, 0.50, 0.51, 0.52 and 0.53, respectively.We consider samples of size n = 30 and 100.For each individual, values for M were generated as a random sample of Poisson distribution with mean θ i = exp ) . The values used for the vector β were chosen so that, when combined with the covariates, the average of cure fractions p θ i (0) = exp(−θ i ), i = 1, . . ., n, were around 10%, 20% or 30%.The values specified for the vector β are given in Table 1.
We also considered three nominal levels (α = 1%, 5% and 10%).The tests were performed and the values of the statistics were compared with the respective quantiles of the chi-square distribution, i.e., 6.635, 3.841 and 2.706 to test the null hypothesis H 00 : β 1 = 0 (q = 1) or 9.210, 5.991 and 4.605 to test the hypothesis H 01 : Under each combination of parameter configuration we ran 10,000 simulations, and calculated the proportion of times that the hypotheses H 00 and H 01 were rejected .

Results
In Tables 2 and 3 we have estimated null rejection rates of the likelihood ratio and the gradient tests based of the null hypothesis H 00 : β 1 = 0 (q = 1) and H 01 : β 1 = β 2 = 0 (q = 2) in each considered situation, for samples of size n = 30 and 100, respectively.It shows that for all considered cases, the null rejection rates of the tests exceed the corresponding nominal level.This is in agreement with liberal characteristic of the likelihood ratio test in small samples and shows the same trend of the gradient test.
For n = 30 the tests get worse (rejection rate gets away from nominal level) when p (number of parameters) increases, and when we increase the number of tested parameters (q = 1 to q = 2).This fact is accentuated especially when we have 30% of censoring.For n = 100, there are not significative change in the performance of tests with increasing of p, or q, even in the presence of censoring.In general, we note that with the increase of sample size the tests become better (the null rejection rates approach the nominal level), regardless of the existence of cure fraction.
When we compare the performance of the likelihood ratio and gradient tests, we noticed they have equivalent results in almost all cases.There are just slight differences when the sample is very small (n = 30).Specifically, the gradient statistic presents mild advantage in some cases with uncensored samples while the likelihood ratio statistic is a little better in cases with censorship where the dimension of the vector parameters tested is lower.
We now consider a brief simulation study to investigate the finite-sample power properties of the tests.To make power comparisons, we must ensure that the test has the same (correct) size under the null hypothesis.As we have seen in our simulations that the likelihood ratio and gradient tests have different sizes, we used 100,000 Monte Carlo simulated samples, drawn under the null hypothesis, to estimate the correct critical value of each test for the fixed nominal level.
Here we considered the tests for n = 30, p = 3, q = 1, 30% of censoring and 30% cure fraction under null hypothesis.We computed the rejection rates under the alternative hypothesis H 01 : β 1 = w, for values of w belonging to the set [−3, 3].As a result (Figure 1) we see that no test seems uniformly more powerful than the other.
case Figure 1.Power of likelihood and gradient tests for n = 30, p = 3, q = 1, 30% of censoring and 30% cure fraction under null hypothesis.

Application
To illustrate the model and the tests presented in this paper, we made an application in a data set about time of relapse free survival of 103 Brazilian children under 15 years, with acute lymphoblastic leukemia (ALL).These children were followed from 1988 to 1992 in some health institutions organized within a cooperative group for the treatment of acute leukemia in the state of Minas Gerais, Brazil.
These data are available in Colosimo and Giolo (2006).Viana et al. (1994) had described the study and analyzed the data using the Cox regression model (Cox, 1972), in order to evaluate the effect of factors on the hazard of recurrence in previously treated children.
At the end of the study, 39 children experienced the event and 64 censored (62% of censorship).Kaplan-Meier estimates of the survival function are shown in Figure 2, and show that in the last year of follow-up, the estimated survival curve apparently stabilizes at some positive value.Although follow-up seems insufficient to notice the occurrence of cured in study, there is a vast literature in the medical area about long-term survivors of pediatric leukemia (see for example Sala et al. (2004), Pui et al. (2003) and Neglia et al. (1991)).
Figure 2. Kaplan-Meier estimates to of the data related to survival to acute lymphoblastic leukemia treatment.
To investigate the differences between subgroups in the data with respect to proportion of individuals who are long-term survivors, we consider a cure rate model associated with 5 factors of 2 levels: number of white cell count at diagnosis (White = 1 if this number is greater than 75, 000 by mm 3 and White = 0 otherwise); standardized age (Age = 1 if the index is greater than −2 and Age = 0 otherwise); standard weight for age and sex (Weight = 1 if the index is greater than −2 and Weight = 0 otherwise); positive periodic acid Schiff (PAS ) reaction in the lymphoblasts (Pas = 1 for more than 5% of Pas positive marrow lymphoblasts and Pas = 0 otherwise) and cytoplasmic vacuolation (Vac = 1 if more than 10% of vacuolated blasts were present, and Vac = 0 otherwise).
We consider that the lifetimes for susceptible individuals Z il , follow a Weibull(ρ, γ) distributions, i = 1, . . ., 103 and l = 1, . . ., M i .We fit this model and apply the likelihood ratio and gradient tests to evaluate the effect of each factor in the cure rate.
The results are shown in Table 4.At the 5% significance level, we note that for both tests the factors White, Pas and Vac are significant to explain the cure fraction.For the Age factor, the p-values obtained for the two tests are slightly greater than 5% so this is not significant at this level.However for the Weight factor there is a divergence between the tests: the likelihood ratio test indicates that the Weight factor is significant (p-value = 0.0419) while the gradient test concludes that it is not significant (p-value = 0.0616).One might argue that, according to the simulation results (see the most similar case in Table 3, line 18) the calculated p-values will always underestimate the true p-values (which would be calculated from the exact statistics distributions).This behavior can lead to undue rejection of hypotheses.Thus, based on this argument we do not reject the null hypothesis and we assume that the Weight factor is not significant in this model.
Thus, the results of fit for the final model are given in Table 5.Thus in this study the group with the highest estimated cure rate (70.5%) is formed by children with lower white cell count at diagnosis (White = 0), with negative cytoplasmic vacuolation (Vac = 0) and PAS positive reaction (Pas = 1).
Note that although Viana et al. (1994) found that malnutrition (measured through the Weight) is the most significant adverse factor affecting time to remission, here we find that it is not significant to explain cure rate.In fact, according to Sala et al. (2004) there is no consensus about the relationship between poor nutritional status and the poor prospect for survival.Besides, covariates do not need exert the same effects on the cure fraction and the time to remission for susceptible individuals.

Concluding Remarks
In this work we compare via simulation, the perfomance of likelihood ratio and gradient tests to test regression coefficients related with cure fraction in Weibull promotion time model.We note that null rejection rates of the tests exceed the corresponding nominal level for small and moderate samples.This well-known liberal tendency of the likelihood ratio test, was also observed to the gradient test, which showed similar size distortions.Additionally, we note that this size distortion increases with the presence of censorship and with the increases of number of tested parameter as well as with the number of the nuisance parameters.This oversized behavior of the tests indicates that the true distributions of the likelihood ratio and gradient statistic have heavier right tail than the chi-square in small and moderate-sized samples.In applications this can lead to undue rejection of hypotheses since the calculated p-values (based in chi-square approximation) will in general underestimate the true p-values.The power simulation study suggest that no test seems uniformly most powerful than other when we use estimated correct critical values.Overall, we understand that the gradient statistic is equivalent to the likelihood ratio one, to test coefficients of this model.
Although the Wald and score tests shares the same asymptotic properties with the likelihood ratio and Gradient tests, they were not included in our simulation study because they require the computation of the Fisher information matrix, which cannot be obtained for the cure fractions models considered here.One could argue that the Fisher information should be replaced by the observed information matrix.We noticed, however, that in small and moderate-sized samples the observed information produced negative standard errors for a non-negligible proportion of the simulated censored samples.This is a problem to be investigated in a future study.
Due to the size distortions of the tests in small samples, an important subject of study is to obtain inferential improvements, like the Bartlett correction (Bartlett , 1937) or the Skovgaard's adjustment (Skovgaard , 1996).However the presence of censorship and cure fraction in cure rate models can make cumbersome or impossible the analytic derivation of corrections.Thus, another topic for future research will be investigate the use of a bootstrap Bartlett adjustment for the log-likelihood ratio statistic (Rocke , 1989) and bootstrap adjustment for gradient statistic.Furthermore we wish to study associated tests to models with cure rate in the presence of covariates associated with the lifetime of susceptible individuals.

Complete and Marginal Likelihood
Here we present details to obtain the likelihood for the complete data (2) and the marginal likelihood function given in (3).We consider the same notations used in Section 2.1 but, for simplicity, we get x i = 1 (no covariates and θ i = θ).Besides, for a single individual i, we denote the complete data by D c i = (y i , δ i , m i ) and the data without the latent variables by D i = (y i , δ i ) .
The likelihood for the complete data D c can be represented as follows Based on classical results in survival analysis, it can be shown that the conditional likelihood function of (Y i , δ i ) given the latent variable M i , for a single individual i, is given by Now, the conditional survival and density functions of T i given the latent variable M i can be obtained, respectively, as Using the above results in (12), we get Thus, the likelihood for the complete data is The likelihood with respect to the marginal distribution of the (Y i , δ i ), denoted by L * , can be obtained by summing overall possible values for the variables M i , that is Considering separately the cases δ i = 0 and δ i = 1, we have Thus, the total marginal likelihood is given by

Table 3 .
Null rejection rates of the likelihood ratio (S LR ) and the gradient (S G ) tests based of the null hypothesis H 00 :

Table 4 .
Estimates and tests for data on pediatric leukemia

Table 5 .
Estimates for final model -pediatric leukemia data.

Table 6 .
Cure rate for pediatric leukemia data.