Bayesian Joint Models for Longitudinal and Multi-state Survival Data

Joint models for longitudinal and time to event data are frequently used in many observational studies such as clinical trials with the aim of investigating how biomarkers which are recorded repeatedly in time are associated with time to an event of interest. In most cases, these joint models only consider a univariate time to event process. However, many clinical trials of patients with cancer, involve multiple recurrences of a single event together with a single terminal event experienced by patients over time. Therefore, this article proposes joint modelling approachs for longitudinal and multi-state data. The approach considers two sub-models that are linked by a common latent random variable. The first sub-model is linear mixed effect model that defines the longitudinal process and the second sub-model is a proportional intensity function for the multi-state process. Furthermore, on the proportional intensity model, two different formulations are used to define dependence structure between longitudinal and multi-state processes. In this article, a semi-Markov process that consider the time spent in the current state is defined for the transitions between states. Moreover, the time spent in each transient state is assumed to have Gompertz distribution. A Bayesian method using Markov Chain Monte Carlo (MCMC) is developed for parameter estimation and inferences. The deviance information criterion (DIC) is also derived for Bayesian model selection and comparison. Finally, our proposed joint modeling approach is evaluated through a simulation study and is applied to real datasets (colorectal and colorectal.Longi) which present a random selection of 150 patients from a multi-center randomized phase III clinical trial FFCD 2000-05 of patients diagnosed with metastatic colorectal cancer.


Introduction
In many biomedical studies such as HIV/AIDS studies, subjects' biomarkers are collected repeatedly over time together with the time to occurrence of a clinical event.The clinical event can be defined as the occurrence of the same type of event more than once or different types among treatment centres.For instance, in prostate cancer , a subject can be followed after cancer treatment and at each clinical visit, biomarker (prostate specific antigen) called longitudinal measurements are recorded together with the time to re-occurrence of cancer (Yu et al., 2008).Another example include the study of time to re-hospitalization by Dendale et el. (2012).In their study, they stated that in order to improve the clinical outcome (re-hospitalization event times) for chronic heart failure patients, patients' blood pressure, body weight and heart rate were measured repeatedly after their first discharge from the hospital along with the time to re-hospitalization.This process (re-hospitalization) is considered as multi-state survival data since a patient could be re-hospitalized more than once over time after the initial discharge.
In the past decade, there have been tremendous works on simultaneously analysis of longitudinal and time to event data.The ultimate idea of these studies, is to account for dependence between these two processes and provide valid and efficient inferences.Traditionally, these two processes are analysed separately, which in most cases leads to inefficient inferences.Classical examples are given by Rizopoulos. (2011) and Guo, & Carlin. (2004) on the HIV/AIDS progression in which CD4 counts were collected repeatedly over time together with the time to AIDS or death.Faucett and Thomas, (1996), Wuffsohn & Tsiatis. (1997), also gave comprehensive reviews of joint models for longitudinal and time to event data.In all these studies, the same principle was used, that is two sub-models were defined, one for longitudinal process and one for time to event process linked by a common latent structure.
Several extensions of joint models have been proposed in recent studies (Ibrahim et al., 2010, McCrink et al., 2013, Lawrence Gould et al., 2015;Król et al., 2016).Another attractive modelling approach in the joint modelling framework is the shared frailty models.These models attempts to account for unobserved heterogeneity which may occur due to the fact that some subjects are more prone to the event than others subjects (Hougaard, 2012).The frailty term is imposed on the hazard function to account for unmeasured covariates.In early development of frailty models, Vaupel et al. (1979) explained that different subjects are considered to be at risk of a clinical event even though some measurements attributes to the risk make no difference in the model.This makes the joint models more efficient and less biased.However, the frailty models have received some criticisms in current development of joint models.One of such criticisms is in a case of multi-state data (recurrent events and terminal event), frailty models assumes that the joint distributions of both events are completely independent.
Under the joint modelling framework, several authors have developed joint models for repeated measurements and multivariate time to events.One of such authors is Chi & Ibrahim (2006), who proposed joint models for multivariate longitudinal and multivariate survival data.A joint model of longitudinal and competing risks survival data was proposed by Huang et al. (2010).However, there have been insufficient studies on the joint models for longitudinal outcomes with multi-state data.The most recently study on such joint models was proposed by Ferrer et al. (2016).Their approach introduces a shared random effects variable for longitudinal measurements and times of transitions between states.To the best of our knowledge, there exist no joint model for longitudinal and multi-state data (recurrent events with death as terminal event) under the framework of Bayesian inference.Therefore, we address this problem by jointly modelling longitudinal and multi-state data under the framework of continuous time Markov process with transient states representing recurrent events and a single absorbing state for terminal event.As pointed out by de Castro et al. (2015), many issues remain unaddressed in the existing literature.One of the issues they pointed out is the dependence structure of the gap times between two states which they say is not always explicitly accounted for.Hence, this paper seek to develop a joint multi-state semi-Markov model that accommodates the time spent in each transient state.Our model comprises two sub-models linked by shared random effects.The first sub-model is linear mixed effect model for modelling evolution of longitudinal measurements where the interrelation between measurements and subject specific effects are accounted by random intercept and slope.The second sub-model is for transition intensities between states which includes covariates history of the process.
Computation for this proposed modelling approach can be awkward.Therefore, in this work, we adapt Bayesian Monte Carlo Markov chain (MCMC) technique that requires both likelihood function which defines the random process that generates the data and the prior probability distributions for the parameters.In particular, we carry out the computational analysis using an existing software package OpenBugs.The rest of this paper is organized as follows: In section 2, we presented the formulation of the joint multi-state semi-Markov model.In section 3, we presented estimation and inference; this section includes joint likelihood, Bayesian parameter estimation procedure which consists of prior and joint posteriors specification.Section 4 presents Monte Carlo simulation study to assess the performance of the proposed joint modeling approach.Applications of the proposed joint modeling approach to real data are given in section 5.Then, the last section presents the discussion.

Joint Multi-state Semi-Markov Model Framework
The joint multi-state semi-Markov model comprised two sub-models: 1) a longitudinal sub-model(linear mixed model) and multi-state semi-Markov sub-model.These sub-models are linked through any function of longitudinal sub-model parameters or shared random effects.

Longitudinal Sub-model
Suppose that we have a sample of n subjects under study, denoted by i(i = 1, 2, ..., n).Let y i ( t i j ) be n i × 1 column vector of random variables representing the observed longitudinal outcomes for subject i measured at time points t i j ( t i1 , t i2 , ..., t in i ) , where j = (1, 2, ..., n i ).Here n i represents the numbers of repeated measurements for subject i which varies among subjects.In practice, we may have missing observations for some subjects.This happens due to the fact that some subjects may decide to drop out of the study for reasons not related to the occurrence of event of interest.Therefore, in this study, we assume that these missing values in longitudinal measurements trajectory are missing independently of the unobserved measurements.
To analyse the longitudinal process, we define the distribution of y i j by a Linear Mixed Effects model, where is the true value of y i j and the outcome variable x i is the (n i × p) design matrix of fixed effects which includes possible time dependent covariates; β is a p × 1 corresponding column vector of the fixed effect coefficients; , where z i denotes (q × 1) design matrix for the random effects; w i ∼ MVN ( 0, A ) and ϵ i is a (n i × 1) column vector of the residuals which represents the part of y i j which is not accounted by the model: ) .It should be noted here that A represents the variance co-variance matrix in which correlations among the repeated measurements and within subject correlation measurement values are represented.On other hand residuals among subjects and random effects w i are independent of each other.The variance covariance matrix is defined as ) , where ρ denotes correlation between random intercept and slope.

Multi-state Semi-Markov Sub-model
A multi-state semi-Markov model is a stochastic process in which individuals occupy one of the set of discrete state at any time and their transition probabilities to the next possible states only depends on history process through the time spent in the current transient state (Foucher et al., 2007).
Figure 1.Multi-state model with three transient states (recurrent events) and one obsorbing state (terminal event) Let X i = {X i (t), t ≥ 0} be the right continuous random process for subject i with finite space E = {0, 1, 2, ..., M − 1, M}, where the integers 0, 1, 2, ..., M − 1 represents the cumulative number of recurrences that subject experienced over the study period.In addition, M represent the absorbing state(e.g., death).In general, X i (t) is defined as the state in which the process is at time t.This process includes the value of possible time dependent covariates up to time t.
We assume that all subjects enter the study at the same time through initial state X i ( t i,0 ) = X i (0) with zero recurrence.
Subjects' markers, time to recurrence and death time are recorded periodically.This allows the data to be stratified in such a way that, a subject move directly from state 0 to an absorbing state or alternatively move to state 1 immediately after experiencing the first recurrence.It follows that a subject is expected to stay in state 1 until it experience its second recurrence then move to state 2. If death was observed, then it move to an absorbing state and so on.This process is said to have a semi-Markov property.
as the probability of transition from state g to state h with constant ∑ h g p i,gh = 1.Note that if state g is a transient state then, p i,gh ≤ 0 and p gh = 0 for g = h.If state g is an absorbing state, the p i,gh = 0 for h g and p i,gh = 1 for h = g.The semi-Markov property assumption works in such a way that, time is rest to zero at each transition.Since our main interest is on the time elapsed between consecutive transitions, we can define the conditional distribution function (CDF) of that time as As usual in survival analysis, survival function s and the transition intensity θ i,gh associated to probability distribution function of sojourn time can be deduced from the probability density function (pdf) f i,gh .If we assume that state g is transient state, that is state 0 to state M − 1 in our case, we can define the pdf of the sojourn time in transient states as Survival function can now be realized from equation (4) as and the conditional transition intensities of the semi-Markov process corresponds to the probabilities of transition from state g to state h between d i,r and d i,r + ∆d i,r given that the process is in state g for a duration d i,r is defined by By using the total probability theorem, marginal survival function of sojourn times in state g is defined from transition probability p gh and survival function S g as ) .
The transition intensity function of the semi-Markov process is said to be related to the transition intensity of the sojourn time, the survival function of the sojourn times and the probabilities of the process.Hence, the corresponding transition intensity function for any subject moving from state g to state h given that the process stays in state g for duration d i,r is defined by where g h, g, h ∈ S and λ i,gg ).Hence, equation ( 7) corresponds to the instantaneous joint probability of moving to state h from state g after staying in state g for duration d i,r .

Distribution of Sojourn Times
We assume that the sojourn times (d i,r ) follow a Gompertz distribution (GD) (Gompertz,1824) with parameter α gh and γ gh such that, d i,r ∼ Gompertz ( α gh , γ gh ) .This distribution is considered to be one of the classical mathematical models that represent survival function based on laws of mortality.It plays an important role in modelling human mortality.El-Gohary et al. (2013) stated that, Gompertz distribution has been used as a growth model to fit the tumor growth due to its flexibility.It's applications are mainly in the area of actuarial and demography and biology (Ahuja & Nash, 1967).In absence of covariates, the Gompertz distribution has the following probability density function, The survival function of the GD is given by and its corresponding transition intensity function is defined by When γ gh > 0, the hazards increases and when γ gh < 0, the hazards decreases.While γ gh = 0, the transition intensity function is equivalent to the one from an exponential distribution with constant hazard rate α gh .

The cumulative function
) ) of GD is given by From equation 12, we have that the quantiles of the GD can be obtained by using the following expression.

Incorporation of Time-Varying Covariates on Sojourn Times Distribution
In order to take into account the influence of the random effects from longitudinal process and the covariates on the waiting time distribution, we used the assumption of the risks proportionality.These factors are included in the transition intensity function of the semi-Markov process.Let v i,gh = (v 1 i,gh , ..., v m i i,gh ) ′ denotes the m i vector of prognostic factor associated with p-vector of regression coefficient ξ gh = (ξ 1 gh , ..., ξ m i gh ) for transition g → h.Let χ i,gh (w i ; d i,r ) be the function that defines the dependent structure between the longitudinal and multi-state semi-Markov processes.Different formulations of random effects function on the transition intensity function can be used in order to propose how the intensity of experiencing a transition depends on the unobserved value of biomarker at d i,r .For example, ), the current value of biomarker w i , the shared random effects Hence, the transition intensity function is given by where λ gh,0 ) denotes the baseline intensity function as defined by equation ( 11) and ψ gh is a vector of parameters that quantifies the strength between longitudinal and survival process for each transition.Thus, we have Note that, from the transition intensity function of the waiting times, the regression coefficients can be interpreted in terms of relative risk.Also, the time-dependent covariates can be considered.However, the effects of time depended covariates can be burdensome.
Using the transition intensity function (Model I), we can define the cumulative transition intensity function for subject i with single movement from state g to state h as follows where the current value χ i,gh and the survival function is given by The probability density function of the waiting times is given by ≤ t to be the true direct observed durations for subject i through state 0,1, 2 before state 3 between time 0 and t.The cumulative transition intensity for subject i at first recurrence duration d i,0 is given by and the inverse cumulative function is given by Consequently, the survival function for subject i in state 0 between time 0 and d * i,0 is given by Now, suppose that subject i has experienced second recurrence at d * i,1 , then its cumulative transition intensity is expressed as and its inverse cumulative transition intensity is defined by where If subject i transit to an absorbing state, that is subject i dies at time d * i,2 , then its cumulative transition intensities can be expressed as follows: and its inverse cumulative transition intensities is where

Parameter Estimation and Inference
The multi-state process defined in section 2 is continuous and only observed between the left truncated time t i,0 and the right censoring C i , such that the observed process is expressed as During this observation time interval, the i th subject may transit m i − 1 times through all transient states at times t i,1 < t i,2 < ... < t i,m i −1 .At these time points, subject i occupies the state X i ) for all r ≥ 0. At the last observation time t i,m i , subject i may move again or censored.Let t * i,r+1 denote the true observed transition for subject i and ) , represents which event has occur at t i,r+1 (transition or censoring), 1 if transition and 0 otherwise.For convenient interpretation, we denote as the vector of observed transition indicators.

Likelihood Function
The likelihood function associated to semi-Markov process can be defined as Now, linking the longitudinal and multi-state sub-models, we first introduce some notation.Let ϕ = (ϕ Y , ϕ T , ϕ w ) be the vector of all possible parameters to be estimated, where ϕ Y denotes all parameters for the longitudinal outcome model , ϕ T includes all parameters for transition intensity and ϕ w represents all parameters from random effects density function.We denote f where f y (y i |ϕ Y ; w i ) is the probability density function of y i defined by is probability density function of sojourn times d i,r defined as and f w (w i |ϕ w ) is the probability density function of random effects w i given by ) , i = 1, ..., n, r = 0, ..., m i , g, h ∈ E } represents the observed data.Under the conditional independence assumption between the random effects and measurement errors of the longitudinal outcomes, the joint likelihood of the longitudinal and the transition intensity sub-models for all observed data is defined by Taking log of (32), gives

Bayesian Inference
Bayesian inference is an approach of statistical inference that uses the Bayes' rule to update the estimates of ϕ after observing some data.This approach is one of the crucial techniques used in modern statistics, especially in mathematical statistics.In Bayesian framework, the unknown quantity, ϕ is treated as a random variable.More precisely, we assume that we have some available initial guess about the distribution of ϕ.This distribution is referred to as prior distribution.
Within the Bayesian paradigm, this distribution can be incorporated into analysis.Hence, with the use of Bayes' theorem, we can combine the joint likelihood function (32) with prior distribution to give posterior distribution for the parameters.
The posterior summaries of the interest of this distribution of the parameters are obtained using simulation techniques.
In the case of the joint model, the posterior distribution can be very complex leading to difficulty in using analytical methods (Zhang et al., 2017).To overcome such difficulties, we proposed Gibbs sampler technique.This is a well known Markov Chain Monte Carlo (MCMC) method that generates samples from posterior distribution of parameters by repeated sampling from the full-conditional densities of each components.More details are given in Robert and Casella (2004).

Prior Distributions
Let p (ϕ) represent prior distribution of a set of unknown parameters ϕ.Choosing prior distributions can be challenging in a case where prior information about the current data is unknown.In this work, we chose prior that will have little to no effect on our final results.This type of prior distribution is called non-informative prior distribution.The ultimate idea of using a non-informative prior distribution is to avoid some external information creeps into analysis.That is, we want observed data to speak for themselves.The prior distributions were specified as follows: We assume independent normal prior for β, γ gh , ξ gh and ψ gh such that β ∼ N(µ β , τ β ), γ gh ∼ N(µ γ gh , τ γ gh ), ξ gh ∼ N(µ ξ gh , τ ξ gh ) and ψ gh ∼ N(µ ψ gh , τ ψ gh ), where µ coe f and τ coe f are the mean and variance of the hyper-parameters.An independent gamma prior distribution for the Gompertz shape parameter α gh was chosen such that α gh ∼ Γ(a gh , b gh ) where a gh > 0 and b gh > 0 are specified with low values such as 0.001 for each transition.Furthermore, we assume inverse gamma prior for residuals such that σ 2 ϵ ∼ IG(a 1 , b 1 ) where a 1 > 0 and b 1 > 0. For the covariance matrix A, we take an inverse wishart prior, A ∼ IW( A 0 , v 0 ) where A 0 denotes the q × q positive definite matrix with v 0 degrees of freedom.
Hence, the joint prior distribution for ϕ is thus defined by

Joint Posteriors
The inference about ϕ conditional on D obs can be done by means of the Bayes' theorem to construct a joint posterior distribution, where p (ϕ|D obs ) denotes the joint posterior probability distribution, which is proportional to joint likelihood contribution for each subject multiplied by the prior distribution.The samples are drawn from the posterior distribution using Gibbs sampler in order to obtain posterior estimates.Hence, the full joint posterior distribution is given the expression As stated earlier, at this point in time, the model has become too complex and analytical evaluation of posterior distribution is not possible.Therefore, Gibbs sampling can be used to obtain posterior estimates.In order to implement the Gibbs sampling technique, the full conditional distribution are needed (see Appendix)

Model Comparison and Selection
In this subsection, we consider selecting models via the Deviance Information Criterion (DIC) (spiegelhalter et al., 2002).
Consider the parameter vector ϕ and observed data vector D obs , then, the deviance Dev(ϕ) is defined as where L(ϕ|D obs ) is given in (32).
For our joint multi-state semi-Markov model, we let Dev(ϕ) to be The posterior expected deviance is given by and the deviance evaluated at the posterior means of ϕ denoted by φ is defined by The difference between equation ( 40) and ( 39) is an estimate of the effective number of parameters p D written as Then, the deviance information criterion, DIC is given by From equation ( 40) and ( 41), DIC can be written as Equivalently, According to Spiegelhalter et al. (2002), the process of computing Dev( φ) involves the computing of n integrals as shown in (28).Hence, this pose a major challenge on computing the overall likelihood of observed data in our joint model.One possible approach to approximate equation ( 44) is to use Adaptive Gaussian Quadrature (AGQ) proposed by Pinheiro and Bates, 1995.In this paper, we adapt Monte Carlo Markov Chain approach.
Let consider the s th MCMC sample, s = {1, 2, ...S }.Also consider the i th subject such that i = {1, ...n}.Now the deviance for the i th subject at the s th MCMC sample is defined as where Then, the deviance over all individuals is given by The posterior mean deviance across the MCMC samples and the deviance at the posterior means of the parameters can then be estimated as follow respectively Then, effective number of model parameters can be deduced from equation ( 48) and ( 49) as The DIC in equation ( 44) can be easily calculated from equation ( 49) and ( 50).
Base on the DIC, the model with the smallest DIC value is considered to be the model that would best predict a replicated dataset which has the same structure as the current observed datasets.However, according to Geedipally at el., 2014, the model is penalized by Dev(ϕ) which will decrease as the number of parameters in the model increases and p D , which compensates for this effect by favouring the model with a smaller number of parameters.Therefore, it is very important to note that the way the model is parametrized will have an impact on the outcomes of the deviance information criterion values (Mwanyekange at el., 2018).

Simulation Study
In this section, we perform simulation under Markov Chain Monte Carlo (MCMC) to assess the performance of the proposed methodology.The data were generated from a joint multi-state survival model with a univariate longitudinal variable and a multi-state process (transition times process).We illustrate the proposed methodology by considering the following Linear Mixed Effects model where ϵ i (t i j ) ∼ N(0, σ 2 ϵ ) = N(0, 0.540), and the vector of random effects w i = (w i0 , w i1 ) is assumed to follow multivariate normal distribution MVN(0, A), where A = ( σ 2 w i0 , ρσ w i0 σ w i1 , ρσ w i0 σ w i1 , σ 2 w i1 ) with σ w i0 = 0.750, σ w i1 = 0.150 and ρ = 0.400.
We generated the data of n = 200 sample sizes and for each simulation, 1000 replicated datasets were considered.Each subject is expected to have at-most n i = 8 number of repeated measurements taken at baseline and thereafter 7 visits, that is t i j = {0.00,..., 2.50}.After this period, subjects are said to be censored non-informatively For the multi-state semi-Markov sub-model, we have We set the vectors of true parameters of Model I and II as follows: The values of baseline covariates were simulated as follows: . The censoring times were drawn from an uniform distribution, C i = U(0.05,2.50).Then, we compute ≤ C i and 0 otherwise.Note that, the sojourn times from Model II are generated in a similar manner.
The summary statistics of the estimated regression coefficients are presented in Table 1 including the true parameter values, bias (Bias), estimates, standard error (S.E), root mean squared error (RMSE) and coverage probabilities (CP) of 95% credible intervals (Cr.I).We compared joint modeling approach that includes the whole biomarker trajectory in multi-state survival sub-model to the joint modeling approach in which only random effects are shared between two submodels.It is evident from Table 1 that estimators of the regression parameters of the longitudinal sub-model are relatively unbiased in both modeling approaches.Our focus is mainly on the multi-state semi-Markov process.From the same table, we clearly see that coverage probability dwells around 95% nominal level in both approaches.Overall, the joint modeling approach with shared random effects shows better performance than the modeling approach with current level of biomarker value.

Data
In this section, we consider the data from the Federation Francophone de Cancerologie Digestive (FFCD) 2000-2005 Multi-center phase III clinical trial of patients diagnosed with metastatic colorectal cancer.The study was conducted between February 2002 and January 2007 in France by FFCD.The main aim of the study was to examine the efficacy of two treatment effects: Sequential arm (S) and Combination arm (C).For the purpose of the present study, we consider datasets presented in Król et al. (2016) and Krol et al. (2017) in which 150 patients were randomly selected from the same clinical trial.The data contains individual follow-up of tumor size measure (sum of the longest diameters of target lesions) and time of new lesions(recurrent events).Moreover, some baseline covariates (age, WHO performance status and previous resection), treatment arm (Combination vs Sequential), time of death or the last observed time for right censored.A total of 906 tumor size measurements were recorded at subject specific follow-up.During the study period, 289 cases of recurrence and 121 death were also recorded.In this study, we chose to apply our joint multi-state semi-Markov model to this datasets including only patients up to their second recurrence.
Observation time (years) Tumor size Observation time (years) Tumor size  The multi-state semi-Markov process are presented through transitions between 4 states (see figure 1).At the first followup in state 0 (0 recurrence), a patient can either experience a transition to state 1 (1 st recurrence) or to an absorbing state 3 (death).After the first recurrence, a patient may experience a transition to state 2 (2 nd recurrence) or move to an absorbing state 3.After the second recurrence, we only considered the transition to an absorbing state 3, that is, patients with 3 rd recurrence are ignored in this study.In total, 90 patients experienced transition to state 1, 35 patients had experienced second recurrence after the first recurrence.41 patients die without experiencing any recurrence, 41 and 22 patients die after first and second recurrence respectively.
In our Gibbs algorithm, we ran three parallel MCMC chains with different random generated initial values for 205 000 iterations.In order to avoid the influence of pre-convergence in our final posterior inference, 55 000 iterations were discarded as burn-in.Hence, our posterior inferences are based on the last 150 000 iterations.For the prior distributions, we used standard non-informative prior distributions for the parameters.That is, β, γ gh , ξ gh and ψ gh are assumed to have normal distributions with mean zero and large variances (β ∼ N(0, 100), γ gh ∼ N(0, 100), ξ gh ∼ N(0, 100), ψ gh ∼ N(0, 100)).Similarly, we take standard normal prior for the association parameters ψ gh,int and ψ gh,slope in Model II.
For the shape parameter in the baseline transition intensity function α gh , an independent gamma prior.As stated in previous section, an inverse wishart prior distribution was taken ( A ∼ Inv − Wishart(A 0 , v 0 )), where A 0 is the prior distribution for covariance matrix with v 0 degree of freedom.Finally, inverse gamma was chosen for the error variance σ 2 ϵ = 1/τ, where τ ∼ Γ(0.01, 0.01).Next, we present results obtained from the samples posterior distribution.Table 2 presents the posterior estimates long with their standard errors (S.E) and 95% credible intervals (Cr.I) for longitudinal sub-model for both models.We clearly see that the joint modeling approach with the current value of biomarker and joint modeling approach with shared random effects are fairly close to each other in terms of parameter estimation.We observed that age of patients is only significantly associated with tumor size in both models for patients transiting from state 1 to state 3, since the 95% credible interval for β 1 does not include the value 0. However, this has negative relationship to tumor size, implying that, as patient's age increases, the tumor size of the patient will decrease.Moreover, the time dependent treatment effect seems to a significant factor tumor size in all transitions for both both models.On the other hand, the performance status (Who.PS) indicate significant effect on the tumor size for 1 → 2, 1 → 3 and 2 → 3 transitions provided that the 95% credible interval for parameter β 2 does not include the value 0. In all the transitions observed, random effects have 95% credible intervals excluding value 0, implying significant heterogeneity for tumor size trajectory.
For the transition intensity process, the covariate of interest were duration and previous resection.From Table 3, we notice that patients previous resection had a negative effect on the transition intensity.However, 95% credible intervals for this parameter ξ gh in both models include value 0 for 0 → 1, 0 → 3, 1 → 2, and 1 → 3, implying that the negative effect is not significant.Time spent in each transient state is expected to have association in respect to different length of times moving from one state to the another.However, it is worth mentioning that sojourn time is negatively associated with transition intensity from transient states to the next possible transient state and positively associated for transitions from transient states to an absorbing state.The association between the longitudinal process and transition intensity process for 0 → 1, 0 → 3, 1 → 2 and 2 → 3 showed positive strength.Nevertheless, studies by Król et al. (2016) and Król et al. (2017) gave some important details on the study.It can be asserted from their studies that, performance status (Who.PS) show a strong effect on the progression to both absorbing state (death) and transient state(new lesions).Therefore, we can confidently say that this study provides some additional insights to the study of longitudinal outcomes and survival times in joint modeling framework.
Finally, we compared the two joint modeling approaches using the Deviance Information Criterion (DIC).We observed the DIC values at each transition.From Table 4, it is evident that joint model with shared random effects appears to be a better model with lower DIC values.

Discussion
Joint models of longitudinal and multi-state data are less explored in literature.Hence, in this article, we have discussed the Bayesian approach for joint modeling of longitudinal and multi-state data.We have used linear mixed effect model for the longitudinal process and semi-proportional transition intensity function for the multi-state process.In addition, we assume that multi-state data exhibit a semi-Markov process whereby transition intensity from a transient state depends on the sojourn/holding time at which subject stayed at the current state.Furthermore, the sojourn time in each transient state is assumed to follow Gompertz distribution.We have used different formulations that illustrates how the two processes are linked together.In the first formulation (Model I), the multivariate function which represents the true value of the biomarker is included in the transition intensity function.While in the second formulation (Model II), the multivariate function represents the random effects that are included in the transition intensity function.
We conducted a simulation study in order to assess the performance of the proposed joint modeling approach.Furthermore, we applied the proposed joint modeling approaches to a sample of patients from colorectal phase III clinical trial FFCD 2000-05 dataset with the aim of identifying prognostic factors that are significantly associated with either longitudinal or transition intensity process.Generally, small differences between the two models were observed for the parameter estimates.We have found out that the proposed methods behave well in terms of parameter estimation as its posterior estimates converges to their target distributions.
One limitation of our study may be the number of states.We have only looked at four states because it will be much complex with many states.Hence, this joint modeling approach can be extended to more complex cases, in which more than one longitudinal processes can be modelled together with multi-state survival process.However such extensions comes with computation and implementation challenges.Therefore, substantial work is needed in terms of software development for this growing field of research.
In short we have introduced the model that jointly analyse longitudinal and multi-state data with an assumption that multistate data has a semi-Markov process.The simulation study in section 4 empirically demonstrated good performance in terms of parameter estimation.Moreover, we have demonstrated how this model can be applied to real life data.Even though this study cover only a small area in a field of joint modeling framework, we are confident to say that it had contributed to the existing literature.

Figure 2 .
Figure 2. The survival (left)and hazard (right) function of the Gompertz distribution d i,r ∼ GD (α gh ; γ gh ) with constant α gh and different values of γ gh

Figure 4 .
Figure 4. Kaplan-Meier curves for each transition Foucher et al. (2005)holding time for subject i in the state X i (t i,r ) before moving to state X i (t i,r+1 ), represented by d i,r = t i,r+1 − t i,r , where t i,r is the time of entry into state X i (t i,r ) for r = 0, ..., m i .Also let D i,r denote the corresponding random variable called sojourn time.Then, borrowing some ideas fromFoucher et al. (2005), we can express That is, individual intensity of moving between states X i ( t i,r+1 ) = g and X i ( t i,r+1) = h depends strongly on history process through the duration spent in the current state X i ( t i,r ) = g.

Table 1 .
Results from simulation study examining the performance of the proposed joint model 5.1.2Multi-stateSemi-MarkovSub-model The following proportional transition intensities model includes previous resection (prev.resection(0:No,1: Yes)) covariate, gap-time (duration) and the function of current values of tumor size-(Model I)λ i,gh (d i,r ) = λ 0,gh (d i,r ) exp { ξ 0,gh + ξ 1,gh Prev.res i,gh + ψ gh µ * i (d i,r) } is the Gompertz baseline function and d i,r is the sojourn time in each transient state.

Table 3 .
Posterior estimates of the Multi-state survival sub-model for the two modeling approaches