Bayesian Inference in a Joint Model for Longitudinal and Time to Event Data with Gompertz Baseline Hazards

Longitudinal and time to event data are frequently encountered in many medical studies. Clinicians are more interested in how longitudinal outcomes influences the time to an event of i nterest. To study the association between longitudinal and time to event data, joint modeling approaches were found to be the most appropriate techniques for such data. The approaches involves the choice of the distribution of the survival times which in most cases authors prefer either exponential or Weibull distribution. However, these distributions have some shortcomings. In this paper, we propose an alternative joint model approach under Bayesian prospective. We assumed that survival times follow a Gompertz distribution. One of the advantages of Gompertz distribution is that its cumulative distribution function has a closed form solution and it accommodates time varying covariates. A Bayesian approach through Gibbs sampling procedure was developed for parameter estimation and inferences. We evaluate the finite samples performance of the joint model through an extensive simulation study and apply the model to a real dataset to determine the association between markers(tumor sizes) and time to death among cancer patients without recurrence. Our analysis suggested that the proposed joint modeling approach perform well in terms of parameter estimations when correlation between random intercepts and slopes is considered.


Introduction
In many clinical studies, longitudinal outcomes are collected alongside with time to event data.For example, we collect repeated patients' information such as heights, weights, blood pressures and many other variables over time and at the same time we are also interested in the time to an event of interest (e.g., death).For such data, the focus is always on how longitudinal outcomes influence the time to an event.Approaches for analyzing these two processes separately have been extensively discussed in the literature.The most common methods used to analyze longitudinal outcomes are the random effect model proposed by Laird and Ware (1982), linear mixed effects model by Verbeke (1997) and Generalized linear models by Laing & Zeger (1986), Zeger et al. (1988).On the other hand, event time process is analyzed by making use of cox proportional hazard models (Hougaard 2012).However, analysis of longitudinal and time to event process separately have received many criticisms as they bear inefficient and bias results.Therefore, it is very important to take into account both processes in analysis.
Over the past two decades, extensive efforts have been made on statistical methods that simultaneously analyze longitudinal outcomes and time to event data.These methods offers great advantages over separate analyzing of each process.A comprehensive overview of joint modeling for longitudinal and time to event data was given in Tsiatis & Davidian (2004).More basic concepts and methods for joint models can be found in Ibrahim et al. (2010).In all these reviews, the two processes are jointly modeled by linking them together through a common latent structure.That is the longitudinal and survival process both share the same random effects.Extensions to earlier developments of joint modeling framework has been proposed.More recent developments with applications includes, Sweeting & Thompson (2011), Rizopoulos et al. (2012), McCrink et al. (2013) and Yang et al. (2016).Some authors have already proposed joint models for multiple longitudinal outcomes and repeated events of different outcome (Chi & Ibrahim.,2006;Musero et al.,2015;Huang & Previous work on joint model for longitudinal outcomes and time to event data with times varying covariates considered linear mixed effects models for longitudinal process and a proportion hazards model with specified baseline function for time to event process, Yu et al. (2006); Song et al. (2002).The most frequently used beeline functions for proportional hazards models assumes that survival data follow exponential or Weibull distribution.Some authors suggested that proportional hazard model with Weibull baseline function may be very flexible when time dependent covariates are included in the model (Casellas, 2007).However, generating survival data from such model with time varying covariates can be too complex.According to Austin (2013), one of the disadvantages of Weibull proportional hazards model with time dependent covariates is that its cumulative distribution function can not be derived in a crossed form.Therefore, to compute the cumulative incidence and the hazard function, we need to make use of numerical integration method.On the other hand, proportional hazard model with exponential baseline function also has disadvantages as it assumes constant hazards.For these reasons, researchers willing to generate survival times that involve time dependent covariates are advised to consider using Gompertz distribution event times data (Austin, 2012).
Proportional hazards models with baseline function from Gompertz distribution seems to be more flexible as one can compute survival function without high programming needs.The Gompertz distribution was first introduced by Gompertz (1825).It is described as one of the fundamental mathematical models that accurately represents survival function based on the laws of mortality.That is the force of mortality or survival times tends to increase exponentially over time.It has been extensively used as growth model in many cancer assessments.It is for these reasons that Gompertz distribution plays an important role in modeling human mortality.Its earlier applications can be found in Ahuj & Nash (1967).
To the best of our knowledge, there has been no joint model for longitudinal outcome and time to event data with proportional hazard model assumed to have Gompertz distribution properties.Therefore, in this article, we propose such model with time varying covariates under the framework of Bayesian inferences.Our joint model is made of two submodels linked together by common random effects.The first submodel is a linear mixed effects model for modeling true and unobserved markers.The interrelationship between measurements and subject specific effects is accounted by random intercept and slope.The second submodel is the proportional hazards model with Gompertz baseline function.To estimate the parameters of the joint model, we use both R and WinBUGS, Ntzoufras (2011) softwares.The rest of this article is organized as follows; In section 2 we present the notations and formulations of the joint model.Section 3 presents estimations and inferences, which includes joint likelihood and Bayesian parameter estimation procedures such as prior and joint posteriors specifications and model selection.In section 4, we present some simulation studies in order to access the performance of the model.Section 5, presents the application of the model to real datasets.The last section presents a discussion.

Longitudinal Sub-Model
Suppose we have observations for n subjects under study.Let y i t i j be n i × 1 column vector of random variables representing the observed longitudinal outcomes for subject i measured at time points t i j t i1 , t i2 , ..., t in i , where j = (1, 2, ..., n i ).
Here n i represents the numbers of repeated measurements for subject i which varies among subjects.In practice, we may have missing observations for some subjects.This happens due to the fact that some subjects may decide to drop out of the study for reasons not related to the occurrence of event of interest.Therefore, in this study, we assume that these mas.ccsenet.orgModern Applied Science Vol. 12, No. 9; 2018 missing values in longitudinal measurements trajectory are missing independently of the unobserved measurements.
To analyze the longitudinal process, we define the distribution of y i j by a Linear Mixed Effect(LME) model, where, µ * i (t i j ) = x i t i j β L + η i t i j is the true value of y i j and the outcome variable x i is the (n i × p) design matrix of fixed effects which includes possible time dependent covariates; β l is a (p × 1) corresponding column vector of the fixed effect coefficients (β L,s ); η i t i j = z i t i j w i , where z i denotes (q × 1) design matrix for the random effects; w i ∼ MVN 0, A and i is a (n i × 1) column vector of the residuals which represents the part of y i j which is not accounted by the model: x t i j β L +η i t i j such that i j ∼ N 0, σ 2 I n i .It should be noted here that A represents the variance co-variance matrix in which correlations among the repeated measurements and within subject correlation measurement values are represented.On other hand residuals among subjects and random effects w i are independent of each others.The variance covariance matrix is defined as with no correlation.

Time to Event Sub-Model
We assumed a proportional hazards model with Gompertz baseline function.Let denote the hazards function for subject i at time t, where λ 0 (t) = α exp (γt) is a baseline function with α > 0 and γ takes any value.When γ > 0 or γ < 0, the hazards function in monotone(increasing or decreasing).On the other hand, when γ = 0, the hazards function is equivalent to hazards function from an exponential distribution.v i represents the vector of prognostic factor associated with vector of coefficient ξ and ψ quantifies the strength between the two processes.Furthermore, the cumulative hazards/incidences function is defined as follows: Equation ( 3) can be simplified as follows: From equation (4), we deduce So, the inverse cumulative hazards function is given as and the individual survival function is expressed as mas.ccsenet.org Modern Applied Science Vol. 12, No. 9; 2018 Consequently, the individual event times can be generated as where T i stands for the survival for subject i and U is a random variable uniformly distributed with U ∼ Uni f orm(0, 1).
The distribution function of the time to event based on the proportional hazards function is and the probability density function of time to event is 3. Estimation and Inferences

The Joint Likelihood Function
Let C i be the censoring time for subject i.Then, we have T i = min T * i , C i denotes the i th subject's observation time, where T * i is the true event time for that subject.Furthermore, we denote δ i = I T i = T * i , were I is the indicator function.It follow that, δ i is equal to 1 if T i is an event time and 0 if T i right censored for subject i. Letting φ = β L , α, ψ, γ, A, σ 2 , ξ denote the vector of all parameters in equation to be estimated.Note that y i j is the longitudinal outcomes for subject i measured at time t i j ( j = 1, 2, ..., n i ).Then, given subject-specific random effects, the two process are said to be independent of each other.Hence, the joint likelihood function of the longitudinal and time to event sub-models based on all observed data where is the probability density function of the longitudinal outcomes conditionally on the random effects w i and is defined as the probability density function of the random effects with q as dimension of the covariance matrix A. The likelihood of the survival sub-part which is a version of equation ( 7) in section 2.

Bayesian Inferences
The solution to the joint likelihood function (11) can not be achieved with the normal standard maximum likelihood procedure due to the fact that it involves integration of the longitudinal and survival components over the subject-specific random effects w i .This requires high programming needs.The best way to overcome such difficult is to make use of Bayesian approach.Faucett and Thomas (1996), proposed a Bayesian Markov Chain Monte Carlo (MCMC) approach that simulates samples from posterior distribution and estimate all the unknown parameters by using non-informative prior p(φ).
Taking equation (11) and p(φ), we can define joint posterior of φ as product of the likelihood of the observed data and some priors as If the observed data is fixed, then we have Modern Applied Science Vol. 12, No. 9; 2018 Therefore, this translate our joint posterior distribution of the parameter φ into Due to computational complexity, log joint posterior distribution is more preferable.Hence, (17)

Full Conditional Distributions
In order to implement the Bayesian procedure, we need the full conditional distribution of each of unknown parameters of the model.Then Gibbs sampler can be used to generate MCMC samples from the joint posterior density p(φ|D Obs ).In Bayesian framework, this procedure involves iteratively sampling from its full conditional distributions with the remaining components(others) fixed to their current value.Hence, the full conditional distribution of the coefficient β L of the linear mixed effect model is where µ β L and τ β L are the parameters of the prior of β L .
The full conditional distribution of ξ is where µ ξ and τ ξ are the parameters of the independent normal prior of ξ.
The full conditional distribution of γ takes the form where µ γ and τ γ are the parameters of the independent normal prior of γ.
The full conditional distribution of shape parameter, α in the survival sub-model is given by where a 0 and b 0 are the parameters of the independent gamma prior of α.
The full conditional distribution of the association between the two processes ψ is where µ ψ and τ ψ are the specified parameters of the independent normal prior of ψ.
The full conditional distribution of inverse variance, 1 σ 2 takes the form where a 0 and b 0 are the parameters of the inverse gamma of σ 2 .
The full conditional distribution of the inverse covariance variance, A −1 takes the form where ν 0 and A 0 are the parameters of the inverse wishart prior of A.
More simplified full conditional distribution can be derived the way as in Faucett & Thomas (1996) and Zhang et al. (2017).

Model Selection
In this paper, we propose two forms of the random effects terms in the joint model: For the model selection, we consider the Deviance information criteria(DIC) (Spiegelhalter et al., 2002) which defined as a version of well known Akaike's information criteria(AIC) and Bayesian criteria(BIC).The idea of using of DIC is that, DIC uses the posterior distribution which allows it to take into account the prior information of both longitudinal and time to event sub-models.Consider φ, the collection of parameters in the joint model and D obs = y i j , T * i , δ i , x i , v i , the observed data.Now, let specify the deviance function as where log(φ|w i , D obs ) is the likelihood in equation( 11).Define D(φ) = E −2 n i=1 log f (y i j , T * i , δ i , x i , v i ; φ) as the expectation of the deviance under posterior and φ = E φ posterior means of the parameters.According to Spiegelhalter et al., 2002, the difference between two measures denoted by p D = D(φ) − D(φ) can be interpreted as the posterior estimate of the effective number of parameters and it measures the complexity of the model.Hence adding it to the posterior mean Deviance, gives a measure of fit that penalized for complexity .Therefore, Based on the DIC, the model with the smallest DIC value is considered to be the model that would best predict a replicated dataset which has the same structure as the current observed dataset.However, as stated by Geedipally at el., 2014, the model is penalized by the D(φ) which will decrease as the number of the parameters in the model increases and p D , which compensates for this effect by favoring model with a smaller number of parameters.Therefore, it is very important to note that the way the model is parameterized will influence the outcomes of the Deviance Information Criterion (DIC) values.

Simulation Studies
In this section, we performed two sets of simulations studies under Markov Chain Monte Carlo (MCMC) in order to assess the performance of the proposed methodology.

Simulation Study 1
The data are generated from a joint model with a single longitudinal variable and a time to event variable.Each subject is expected to have a record of n i = 5 biomarker values recorded at baseline and thereafter 4 visits are scheduled at equally spaced time interval t i j = {0.00,0.15, 0.30, 0.45, 0.60}.After this period, subjects are said to be censored noninformatively.Specifically, we first generate data using a Linear Mixed Effects model for the longitudinal sub-model; where µ * i (t i j ) is the true unobserved biomarker value and i (t i j ) ∼ N(0, σ 2 ), where σ 2 = 0.5 is the measurement error variance.We then simulate longitudinal data from a linear curve, µ * i (t i j ) = β 0 + β 1 x i + β 2 t i j + w i0 + w i1 t i j , where the vector of random effects w i = (w i0 , w i1 ) is simulated from a multivariate normal distribution MVN(0, A), with variance covariance matrix A = σ 2 w i0 ρσ w i0 σ w i1 ρσ w i0 σ w i1 σ 2 w i1 = 0.75 2 −0.02 −0.02 0.15 2 .Note that ρ = −0.2.We set coefficients β = (β 0 , β 1 , β 2 ) = (3.00,0.50, 5.00) .
For the survival sub-model, we have where λ 0 (t) = α exp(γ(t), with α = 0.02 and γ = 1.2, v i is a vector of baseline covariates with associated coefficients ξ = (ξ 0 , ξ 1 ) , with ξ 0 = −0.74 and ξ 1 = −0.015.The values of baseline covariates were simulated as follows: x i ∼ N(12, 4) and v i ∼ Bin(1, 0.5) The parameter ψ was set at 0.2.Since our interest is in the time to event, we generate the vector of true event times T i * by first simulate the survival probability, U i , from Uni f orm(0, 1) for each subject and solve for T i * from the following equation: To obtain the cumulative function, we let Hence, we have Now, solving for T * i , from equation ( 29), we have We draw censoring time C i from an uniform distribution Ui f orm(0.05,8).Then, we compute T i = min(T * i , C i ) and event indicator δ i = 1 if T * i ≤ C i and 0 otherwise.The censoring was recored between 40% − 45%.

Simulation Study Results
The aim of the simulation study was to investigate the performance of the joint model.The results are summarized in Table 1 and 2. We have considered several quantities to determined the behavior of the estimators φ by comparing to the true φ as follows; 1) The estimated bias: where a negative bias indicates an underestimation while a positive bias indicates an overestimation.
2) The Root Mean Square Error: this measures the accuracy of the estimates.The lower the RMSE, the more accurate effects estimates.
3 The standard error: S .E( φ) = σ √ n , defined as equal to standard deviation divided by the square root of the sample size.This implies that the larger the sample size, the smaller the standard error.
4) The 95% coverage probability(CP): φ ± 1.96 × se( φ), the proportion of 1000 simulated data sets for which 95% confidence intervals included the true estimates.The closer the outcomes to 95%CP(0.95), the more accurate the estimates.The summary statistics of the estimated regression coefficients are summarized in table 1 and 2, respectively for n = 200 and n = 500 subjects.In each simulation, 1000 replications were performed.We can clearly see that the proposed methodology performed well in terms of parameter estimations.The biases are relatively small and the probability of 95% credible intervals dwells around the 0.95 values.On the other hand, simulation with large sample size have smaller standard error, hence the root mean square error.Overall, better results were obtained with joint model with correlated random effects.That is, the correlation between random intercept and slope positively influences the estimates.

Data
In this section, we apply the joint model to the  Figure 2 show the the trace and density plots for posterior marginal distributions of selected parameters.We clearly see that, the MCMC of all parameters have converged to their target posterior distributions.

Discussion
Joint modeling for longitudinal outcome and time to event data has gained increasing popularity in literature.However, when it comes to the choice of baseline function on survival sub-part, many authors assumed that the survival times follow exponential or weibull distributions.In this paper, we developed a joint model under a Bayesian prospective assuming that the proportional hazards model for the survival times has a Gompertz baseline hazard function.We think that generating survival times from a Gompertz distribution have more advantages over Weibull distribution as its cumulative distribution has a closed form solution which make it more easier to simulate survival data in presence of time varying covariates.We started building separate models for each process and then link them together through a common latent variable.Our model incorporate both time invariant(fixed) and time varying covariates that forces the hazards of the outcomes to change over time.On the other hand, the inter-relationship between markers was accounted by subject-specific random effects.
Due to high programming needs for fitting the joint likelihood function, we proposed a Bayesian approach that estimates the parameters by simulating samples from posterior distribution.Specifically, Gibbs sampler was used for posterior inferences as it provides conveniently way to the fit of complex models.We have conducted an extensive examination of the model parameter estimation through simulation studies (simulation study 1correlated random effects, and simulation study 2-uncorrelated random effects).The simulation results for both simulation studies are presented by looking at several quantities such as Bias-the difference between the average estimate over all simulations and the true parameter value, S.E-standard error of the estimates that measures the accuracy of predictions, RMSE-the square root of the mean error and CP-the coverage probability.The results from the two simulation studies, indicated adequate performance of the joint model.They highlighted however some weakness of the model when few sample sizes were used.
There is ample of additional work needed in joint modeling framework.This paper covered only a small area of this fast growing field of research.Therefore, this work can be extended further to accommodate multiple longitudinal outcomes and competing risks as done by Musoro (2014).In future, we plan to develop a joint model for longitudinal and time to event data assuming that survival times follow a Generalized Gompertz distribution with three parameters (Haile et al.. 2016).
In short, we have introduced a more flexible joint model for longitudinal and time to event data assuming that the survival time follow a Gompertz distribution.We further demonstrated that this model can easily be used in practice through a study, the FFCD 2000-2005 multi-center phase III clinical trial of patients diagnosed with metastatic colorectal cancer.In both simulation and application studies, two cases (case I with correlated random intercepts and slope and case II with uncorrelated random intercepts and slopes) were considered.It is clear that our work contributed to this fascinating research area by making use of more flexible methodology to develop a joint model.
a) Correlated random intercepts and slope -Case I b) Uncorrelated random intercepts and slopes -Case II FFCD 2000-2005 multi-center phase III clinical trial of patients diagnosed with metastatic colorectal cancer.The study was conducted between February 2002 and January 2007 in France by Federation Francophone de Cancerologie Digestive(FFCD).The main aim of the study was to examine the efficacy of two treatment effects: Sequential arm(S) and combination arm(C).We consider datasets presented by Kro l et al. (2016 & 2017), in which 150 patients were randomly selected from the same clinical trial.The data contains individual progression of disease such as tumor size, time of new lesions(recurrent events), baseline covariates(Age, WHO performance status and previous resection: combination arm vs sequential arm), time to death or the last observed time for right censored.A total of 906 tumor size measurements were recored at subject specific follow-up time.During the study, 289 recurrences and 121 deaths were also recorded.We chose to model the longitudinal outcomes together with time to death without recurrence.In our model, we included a total of 716 tumor size measurements of 41 deaths and 109 right censored.The aim of this application is to examine the effects of longitudinal dynamics and baseline covariates on subjects who died without experiencing any recurrence.

Figure 1 .
Figure 1.Individual profiles for Longitudinal measurements(right) and the Kaplan-Meir estimates of the survival function among patients with no recurrence(left)

Table 1 .
Simulation study 1 results with correlation random effects from 1000 replications of 200 and 500 subjects.