Time Scales in Epidemiological Analysis: An Empirical Comparison

The Cox proportional hazards model is routinely used to analyze time-to-event data. To use this model requires the definition of a unique well-defined time scale. Most often, observation time is used as the time scale for both clinical and observational studies. Recently after a suggestion that it may be a more appropriate scale, chronological age has begun to appear as the time scale used in some reports. There appears to be no general consensus about which time scale is appropriate for any given analysis. It has been suggested that if the baseline hazard is exponential or if the age-at-entry is independent of covariates used in the model, then the two time scales provide similar results. In this report we provide an empirical examination of the results using the two different time scales using a large collection of data sets to examine the relationship between systolic blood pressure and coronary heart disease death (CHD death). We demonstrate, in this empirical example that the two time-scales can lead to differing results even when these two conditions appear to hold.


Introduction
Survival models are extensively used in epidemiological studies. There are various models available to analyze time to event data but the most frequently used model is the semi-parametric Cox proportional hazards (PH) model (Cox 1972). The PH model is widely used with both experimental (clinical trials) and observational data and provides a semi-parametric method of analyzing the association between a set of risk factors and the time to occurrence of an outcome. The PH model makes no assumptions concerning the nature or shape of the underlying survival distribution, but assumes a parametric form for the effect of the predictors on the hazard. In many situations, we are most interested in estimates of the model parameters since if the model is correct they provide measures of the strength of the association between a characteristic and time to event. Most statistical software contains procedures for deriving estimates of the parameters of the model. Current literature addressing the association between a characteristic and time to event based on the analysis of observational studies using the PH model contains a mixture of analyses using the two different time scales: time-on-study and chronological age. For example, there are two widely used sets of models to predict cardiovascular disease. The models that are widely used in the United States were derived from the Framingham Heart Study (Wilson et al. 1998) using time-on-study as the time scale, while the models that are widely used in Europe (Conroy et al. 2003) were developed using age as the time scale. Similarly, there has been long-term interest in estimating the effects of obesity on mortality and two papers have appeared addressing this question but using different time scales. In 1999, Allison et al. published their estimates of the number of deaths in the U.S. that are attributable to obesity. For their analysis, they used time-on-study as the time scale. In 2007, Flegal et al. published a similar analysis, based on slightly different data, but used chronological age as the time scale. A natural question is whether at least part of the differences between the results of these analyses are due to different time scales being selected for developing the proportional hazards models used.
An extreme example was provided by Cheung et al. (2003) who demonstrated that a PH model using different time scales could result in contradictory results that the models in which the parameters have opposite sign depending on the time scale. They examined women in the Surveillance, Epidemiology, and End Results (SEER) program diagnosed with Stage I breast cancer and demonstrated that when time-on-study was used as the time scale, a younger age at diagnosis was associated with a lower mortality. If chronological age was used as the time scale, the opposite effect was found.
In this paper we provide an extensive empirical example that suggests that the use of the two different time scales with the same data can result in significantly different results and that models can disagree even if the empirical baseline hazard appears to be exponential or there is a low degree of observed correlation between the covariate and the entry-age; in some cases, although not significant, the estimated coefficients have opposite signs.

Background
In 1997, Korn et al. pointed out that the majority of published analyses based on the National Health and Nutrition Examination Survey I Epidemiological Follow up Study used time-on-study as the time scale in the PH model but suggested that chronological age might be a more appropriate time scale for some observational studies. These two scales vary in their choice of origin of the time scale. If we use chronological age as the time scale then the origin of the time scale is the date of birth. If time-on-study scale is used the origin is the date of diagnosis or date of randomization.
The mathematical formulation of the Cox model is similar for both the time-onstudy and the chronological age time scales but the implicit mechanism for estimation is different (Thiebaut and Benichou 2004). In both of the formulations time is used only to order the times to event. This ordering defines the number of individuals at risk, the risk set at a particular time. Time scales that produce equivalent risk sets will produce equivalent results. The difference between the two scales is that using observation times, the risk sets are nested. That is, at any time t, the risk set, t R is contained in the risk set * t R for any * t t < . Using age does not necessarily result in this nesting of risk sets. This is because at a given age, some subjects may have already experienced failures while others may not be under study at that age and thus subjects keep on entering and exiting the study. In their paper, Korn et al. (1997)  Study (NHEFS) data on women and compared estimated log hazard ratios using the three different models. They found no differences in log hazard ratio estimates between the first two models but reported the third one to be different from the first two. Based on this study they suggested that the most appropriate time scale might be chronological age. Korn et al. (1997) also suggested two conditions for which the analysis results using the time-on-study time scale and the chronological age scale are equivalent. The first condition is that the baseline hazard 0 λ as a function of age is exponential, i.e., that if the cumulative hazard function is exponential, the two time scale models yielded similar regression coefficients. On the other hand they saw that the models could be significantly different even when the covariate of interest was independent of the baseline age. Pencina et al. (2007) found that when the correlation between the age-at-entry and the risk factor is zero, the biases from the two models using time-on-study time and chronological age time are very close. However, these two simulation studies are inconclusive about which time scale is the best. Also, the two papers are not consistent in their recommendations.
In this paper, we examine this issue with a large empirical datasets.

Data
In order to examine this issue, we fitted PH models with two time-scales using a large

Statistical Methods
We restrict our attention to the proportional hazards model (PH) in this paper since this is the most widely used method in epidemiological studies. Also, we focus on a single covariate, systolic blood pressure (sbp) so that we can examine whether results are consistent using the different times scales and whether they are consistent with published relationships between sbp and coronary heart disease death (CHD) as the event.
The PH model assumes that the underlying hazard (rather than survival time) is a function of the independent variables (covariates) and the contributions of the covariates to the hazard are multiplicative. This model specifies that the hazard function associated with the covariate Z satisfies, is the unspecified baseline hazard (the baseline hazard when all is a vector of unknown constants, the parameters of interest. Estimates of the parameters and inferences about them are based on maximum partial likelihood (Cox 1972) and the asymptotic properties are justified using martingale and counting process theory (Anderson and Gill, 1982). In the PH model, time is used to order the events and determine the risk sets of subjects still being followed when each event occurs. It is not used directly in the estimation of the coefficients of the covariates.
Therefore, the different time scales in PH models lead to different estimates only if they differ in the ordering of the times to event or right censoring. Chronological age and time on study will in general produce different ordering of times to event and right censoring.
Let 0 a be the age at which an individual enters the study, t be the length of time the individual is followed until he/she experiences an event of interest or terminates participation in the study, and a be the age of the individual at the point of event or censoring. We focus on the following three models in this paper:

M1:
Time-on-study as the time scale with age at entry included as a covariate: where β is the coefficient of z, the sbp and γ is the coefficient of 0 a , the baseline age.

M2:
Age as time scale without adjustment: M3: Age as time scale with left truncation on the entry age: is the baseline hazard conditional on entry age.
We used sbp as a risk factor and fitted the three Cox PH models 2, 3 and 4 for each of the 54 cohorts separately.
There are several possible methods for comparing of the coefficients estimated from the fitted models. We used bootstrap methods to determine whether the coefficients for pairs of models differed significantly. Bootstrap samples with one thousand replications were used to calculate the standard error of the difference between two betas for each pair. The significant difference in coefficients from the two methods was calculated assuming normality of the differences: For each of 54 cohorts, we estimated the cumulative baseline hazard function as a function of age using Breslow method (Cox 1972, with discussion). We plotted these estimates against Age to determine the shape of the baseline hazard and again we plotted the log of the estimates against the Age for the test of exponentiality (Tableman and Kim 2004). Two examples are shown in Figure 4.
We calculated correlation coefficients between sbp and age at entry to examine how strongly they were associated. This empirical correlation and whether the hazard function appeared to be exponential were used to determine if, at least approximately, the conditions from Korn et al. (1997) were met.
All analyses were performed with Stata version 9 (2005) or R version 2.15 (2008).    The correlation and exponentiality divide up the 54 cohorts. Table 1   When the correlation is modestly strong (0.3-0.4), 12 cases are significant and 4 cases are non-significant. Both Korn (1997) and Pencina (2007) suggest that we should expect the models to be equivalent at independence or correlation zero.

Results
But, their argument does not extend to low or moderate correlations. At correlations between 0.1 and 0.2 (second last row of the table) the models are significantly different in all 6 cases. These results support that the models can perform differently even when there is very little association between the risk factor and the baseline age. These are the closest pair of models in terms of predictor significance out of three pairs of comparisons we made.   Figure 4: Two examples of cumulative baseline hazard and corresponding logcumulative hazard plots. Example 1 represents an example of exponential case and Example 2 represents an example of non-exponential case.
Next, we estimated the cumulative baseline hazard function as a function of age using Breslow's method (1972) and plotted them against age. We further plotted the log of the cumulative hazard estimate against age to see if it appears linear. The results show that 18 cases appear close to the exponential form hazard and the remaining 36 plots look non-exponential form.  We also noted that in some of the cohort data sets the time-on-study time and chronological age time scale models indicate the association between the risk factor and hazard of occurring disease are in opposite direction. In five cohort data sets, the coefficients from unadjusted age time scale models and in one case the coefficient from left truncated age time scale model are negative. These results agree with those reported by Cheung et al. (2003). The widely accepted fact is that the systolic blood pressure increases risk of coronary heart disease deaths. But, we observed that age scale models can not always detect this, instead sometimes it erroneously suggests that sbp has beneficial effect on the coronary heart disease.
( Table 3 is at the end of the paper)

Discussion
In this report we presented a comparison of results of PH models using two different time scales in 54 different datasets. In our results, we found that using unadjusted age as the time scale results in significantly different coefficients even when there is very low correlation between a covariate and baseline age and also when the cumulative baseline hazards appear to be exponentially distributed. We also found that the estimates when It is extremely valuable to use real datasets to assess the different models in empirical analysis as presented in this paper. A shortcoming of doing so is that the comparisons are based on some data generating mechanism we have no clue about. Chalise et al. (2013) have carried out extensive simulation studies to address the question of robustness using one time scale when the other is actually the correct one (Chalise et al. 2013). They generated data according to a specified model and then compared the different models against the specified model with respect to bias, mean square error and measure of predictive discrimination. Two simulated scenarios were created where they correctly specified one of the two time scales. When time-on-study was correctly specified, the time on study models were better with respect to all three measures. But, when age was the correct time scale both time-scale models performed approximately equally well. This simulation studies suggested that the time-on-study models are robust to misspecification of the underlying time scale suggesting that time-on-study models may be preferable in case of uncertainty of the true time scale.
In some situations, there may be multiple plausible time scales. For example, automobile warranties usually use two time scales, calendar time and cumulative mileage; in studies of skin cancer among occupationally exposed workers, cumulative exposure to radiation may provide a better time scale than does the person's age or time on study. Some investigators have examined methods for deriving an optimal time scale. Farewell and Cox (1979) and Oakes (1995) suggested choosing a time scale that combines two or more times scales in such a way that the resulting scale accounts for as much the variation as possible. Duchesne and Lawless (2000) introduced the concept of an ideal time scale. Their work, however, focuses on usage (e.g. mileage) or exposure (e.g. asbestos exposure) variables that used as time scales (and could be adapted to epidemiological analyses in some instances) but do not solve the problems inherent in comparing time scales differ only in having different origins.
A unique well-defined time scale is indispensable for event history analysis.
Given the lack of an agreed upon definition for an optimal or even correct time scale, robustness may be the only practical criteria on which to base our decision. In the meantime, our personal opinion, based on our results is that the time-on-study time scale is usually appropriate since it answers the conditional question that is the primary focus of epidemiological studies: Given what we measure at baseline what is the probability of future events?

Conflict of Interest:
The authors have declared no conflict of interest.