Application of Poisson Mixed Combined Models for Identifying Correlations of CD 4 Count Progression in HIV Infected TB Patients During ART Treatment Period

CD4 count is used to measures the number CD4 cells in the blood mostly during ART treatment to know the risk progression of HIV in the HIV infected patients. This continuously measured CD4 count during the treatment period results longitudinal data having correlation and over dispersion effects. While modeling such data to identify associated factors of change in CD4 count to monitor the progression of HIV most of the study did not considered these two main effects. The main aim of this study was also to consider these two main effect to identify the risk factors CD4 count progression based on 239 HIV infected TB patients who were 18 years old and above taking ART treatment from 1st September 2009 to 1st July 2014 at Jimma University Specialize Hospital. The result of study showed Poisson normal Gamma combine model which handles correlation and over dispersion effects of CD4 count simultaneously was an appropriate fit of the data among different Poisson mixed combined models considered for the study based on Akaki information criteria (AIC)comparisons. The estimated model depicts linear time and it’s interaction effect with functional status category group of the patients have positive effect whereas quadratic time has the negative effect on the progression of CD4 count. The model also showed baseline bedridden and ambulatory functional status group patients has lower average CD4 count measurements in comparison with working functional status group patients counterparts. Therefore, while modeling CD4 count correlation and over dispersion should be taken in to consideration since the CD4 count value was correlated due to repeated measurement and it’s variance larger than mean leading to over dispersion. Being at bedridden, ambulatory functional status at baseline in comparison with working functional status group and having quadratic time effects were also the associated risk factors that lowers the CD4 count measurements of the patients during the ART treatment period at the study area.

CD4 count measures the number of CD4 T lymphocytes (CD4 cells) in a sample of your blood.It is the most important laboratory indicator of how well the immune system is working, the strongest predictor of HIV progression and survival.It is also one of the key factors in determining both the urgency of antiretroviral therapy (ART) and survival according to findings from clinical trials and cohort studies (Mellors, JW. & et al., 1997;Mellors, JW., et al., 2002) CD4 count was considered to hold predictive value for no more than the subsequent 6 month period, with individual patients contributing multiple 6 month periods of follow up.Lower CD4 counts are associated with greater risk of disease progression.This risk of progression to AIDS increases substantially at CD4 counts less than 350 cells/mm3, the greatest risk increase occurring as CD4 counts fall below 200 cells/mm3.The risk of disease progression at 200 cells/mm3, the threshold for ART initiation in resource limited settings (CASCADE, 2004) Human Immunodeficiency Virus (HIV) infected patients may have TB infection either latent or active TB disease.HIV infection speeding up the progression from latent to active TB when the CD4 count of the patient was lowered and TB bacteria also accelerate the progress of HIV infection (Mayer, K., 2010).In 2013 of the estimated 9 million people who developed TB an estimated 1.1 million (13%) were HIV positive.There were also in 2013 360,000 deaths from HIV associated TB equivalent to 25% of all TB deaths, and around 25% of the estimated 1.5 million deaths from HIV/AIDS (Global Tuberculosis Control, 2014) Human Immunodeficiency Virus increased risk of developing tuberculosis (TB) with losing of cell mediated immunity, along with a quantitative decline in circulating CD4 lymphocytes and tuberculosis occurs sooner than other opportunistic infections.TB additionally contributes to reduction in CD4 count in HIV/TB co-infected patients and leads to greater improvement in count following treatment as compared to CD4 matched TB uninfected individuals (Wanchu, A. & et al., 2014) In many medical and biomedical areas measuring the count outcome such as CD4 count during ART treatment is very common.When such data are collected longitudinally from a given subject repeatedly over time it results in repeated measurement of the observations within subject.However, Statistical modeling such data have several challenges because of this observations are correlated as the results of repeated measurements and over dispersed as the result of the variance of such data exceeds it's mean (Breslow, N. & Clayton, D., 1993;Wolfinger, R. & OConnell, M., 1993).Therefore, while statistical modeling such data one should have to consider these two effects in order to relate independent covariates with change in outcome variable during the measurement time.
CD4 count measurement is also among the count outcome measured for the HIV infected patients in order to know the progression of disease during treatment period.Various studies also reported factors related to change CD4 count measurement for HIV infected TB patients during their treatment period.But, there was no more studies considered the effects of correlation and over dispersion for this longitudinal measured CD4 count while reporting different factors related to change in CD4 count during the treatment period.An appropriate model, which is able to indicate related factors to change in CD4 during ART treatment, with an appropriate methodology, would provide support for possible therapeutic interventions and, consequently, a better quality of life and survival for patients.The main aim of this study was also to identify factors related to progression of CD4 count measurement of HIV infected TB patients using an appropriate Poisson mixed model with the consideration over dispersion which happens due to variance exceeds the mean of Poisson distribution and correlation effects which happens due to repeated measures of CD4 count within subject.

Data Source
Data for the study was obtained from HIV infected TB patients treated at Jimma University Specialized Hospital HIV outpatient Clinics, South West of Ethiopia.The study population consists of all HIV infected TB patients who were 18 years or older, and who were on ART treatment at any time between 1 st September 2009 and 1 st July 2014.Among 550 patients having the case at the Hospital, 239 patients who have at least one CD4 count measurement during the treatment period and having full record data were considered in the study.All the patients' epidemiological, laboratory and clinical information were collected from the patients chart of ART follow up retrospectively.

Variables of the Study
The outcome variable of this study was longitudinally measured CD4 count of HIV infected TB patients during ART treatment period.CD4 counts the number of cells per mm 3 of blood, which is an indication for the progression of HIV and measured approximately within 6 months interval during the treatment period.The 12 independent variables which was extracted from patients chart were listed with their categories on the Table 1 below

Ethical Consideration
Ethical clearance was obtained from Department of Statistics of Jimma University.Personal information was kept confidentially without disclosing to others during data collection from patient cards.

Exploratory Data Analysis
Exploratory analysis of longitudinal data seeks to discover patterns of systematic variation across groups of patients, as well as aspects of random variation that distinguish individual patients.This study also considered individual profile plots to explore within and between subject variability and mean structure plots to explore the average progression of CD4 count measurement over the measurement time which helps as input for modeling.

Generalized Linear Mixed Model
Non-Gaussian repeated count data such as longitudinally measured diastolic and CD4 count to know the progression of patients during treatment period is most frequently modeled using generalized linear mixed model (GLMM) which allows the inclusion of subject specific random effects in the model (Breslow, N. & Clayton, D., 1993;Wolfinger, R. & OConnell, M., 1993;Engel, B. & Keen, A., 1992).
For such non Gaussian repeated measurement data let Y i j be the outcome variable on the i th subject measured at j th time point and the random effect to be in the model for subject specific variation is b i having normally distribution with mean 0 and variance covariance matrices D, then it is assumed that the conditional distribution of the outcome variable Y i j /b i s independent and it belongs to the exponential family of distributions which is expressed as: Where the mean of the distribution is given by: Where, η(.) is the known link function,X i j is a p-dimensional design matrix of the fixed effect parameters β, Z i j is a qdimensional design matrix of the random effects b i and ϕ is scale (over dispersion) parameter.In this setting the likelihood contribution of i th subject is given by: Based on the equation (3) the general likelihood which is the function of β,ϕ and D is expressed as: This likelihood function in equation ( 4) does not have an analytical solution, and hence numerical approximations are needed and an extensive overview of different approximations for this likelihood is available in Molenberghs and Verbeke (1993) and Skrondal and Rabe-Hesketh (2004).
In case of CD4 count measurement which is specific to this study, let Y i j be the value of CD4 count measurement measured on i th patient at time point j th has a Poisson distribution with parameter λ i j .Then, the conditional of the mean is modeled as:

Models Combining Over Dispersion with Random Effects
In most practical cases both over dispersion and correlation can happen together, and this led Molenberghs et al. (2010) to formulate a flexible and unified modeling framework which termed as combined model.This combined model simultaneously captures over dispersion and correlation for a wide range of data's including longitudinal count data's.In this modeling framework the normally distributed subject-specific random effects capture the correlation due to repeated measurement while the conjugate measurement-specific random effects on the natural parameter, is used to accommodate over dispersion.
According to Molenberghs et al. (2010) the combined distribution model which accommodates both the over dispersion and normal random effects can be expressed in the form of: This model expression has similar notation to that equation (1) except inclusion of θ i j parameter which is included in this model to accommodate over dispersion effect.The conditional mean for this combine distribution is expressed as: Where; the random variable , ϑ i j and σ 2 i j are the mean and the variance of θ i j respectively which accommodate over dispersion in the model.As in GLMM b i has normal distribution and the link function is expressed as η i j = X T i j β + Z T i j b i .The two different notations η i j and λ i j to refer to the linear predictor and/or the natural parameter and to encompasses the random variables θ i j respectively.Based on conditional mean parametrization of equation ( 7) which allows for the random effect θ i j for capturing over dispersion the likelihood contribution of i th subject is expressed as: From this the general likelihood function is given by: Specific to this study, let Y i j be the CD4 count measurements for subject i=1,2, . ..,N at time point j=1,2,.. .,ni has a Poisson distribution where the normal random effect is introduced to accommodate correlation due to repeated measurement of CD4 count within subject and gamma random effect is included to accommodate over dispersion due to the variance CD4 count was large than its mean.Accordingly, Poisson model with normal and gamma random effects can be specified as follows: The conditional mean λ i j = θ i j k i j is modeled by: Where; b i ∼ N(0, D) and θ i j ∼ Gamma(α,ν),X i j and Z i j p-dimensional and q dimensional matrices of known covariate for values β and b i a unknown fixed and random vector coefficients respectively.
For the distribution of the outcome variable in equation ( 10) if θ i j over dispersion random effects is omitted the resulting model become Poisson Normal.Similarly, if we omit the normal random effect in this equation the resulting model becomes Poisson Gamma model.To estimate the all parameters of the model SAS NLMIXED procedure which is used to fits nonlinear mixed models by maximizing an approximation to the likelihood integrated over the random effects using adaptive Gaussian quadrature was considered.

Model Comparisons
Before proceeding to model comparisons back ward automatic variable selection techniques was employed to exclude non-significant variable from the given generalize linear mixed model.The automatic variable selection was done to fit all models with the same variable for the sake of comparisons among the models.
To select an appropriate model among candidate models considered in the study that associates predictor variables with change in CD4 count over time -2log-likelihood and AIC (Sakamoto, Y. & et al., 1986) of the models were considered.Among these candidate models the model having minimum value of -2log-likelihood and AIC was considered as an appropriate fit of the data where the AIC value is given by: Where; log-lik and p are the log likelihood and number of estimated parameters respectively.The average CD4 count during treatment period also helps to know over all average value of CD4 count by each category group which was from when the patient started the treatment to the end of the study or during 48 months visit.Accordingly, the all possible average CD4 count during the treatment period lied between 122.36 with standard deviation 121.10 and 320.32 with standard deviation 188.81 which was observed in bedridden functional status category group and protestant religious category group respectively.As shown on Table 3 of average CD4 count and its standard deviation during the measurement period for all measurement period the variance which is square of standard deviation was larger than the mean of CD4 count measurement.But,in theoretic for count data assuming Poisson distribution the mean must be equal to the variance and the description this data indication the existence of over dispersion.

As indicated on
The trend of average CD4 count indicates the average CD4 count has increasing value from the baseline time (173.03) up to 24 th months (401.91)andhas decreasing value after 24 th month to end of the follow up period.Similarly, standard deviation value which describes the variation of CD4 count has also the increasing trend from baseline (149.91) to 18 th months(208.33) and has decreasing trend from 30 th months to the last of the follow up period.173.03 287.27 350.77 401.90 401.91 423.23 383.44 293.33 220.67 SD 149.07 178.76 191.11 208.33 204.67 211.26 198.83 122.76 35.23 SD = Standard Deviation

Exploratory Data Analysis
Before proceeding to model building an exploratory data analysis is essential in longitudinal data to identify variance structure, the mean structure, fixed and random effects to be included in the model.Accordingly, the CD4 count measurement of the patients was explored using individual profile plots and mean structure plots which was described as follows: The individual profile plot of Figure 1 was plotted by considering CD4 count some of the patients to have clear visualization for exploration.This plot clearly depicts the existence within and between variability in CD4 count measurements.This is, therefore, an indication for inclusion of random intercept and slope of the CD4 count to capture subject specific variability of the CD4 count since the plot indicated the existence of within and between variability of CD4 count measurements.

Figure 1. Individual profile plot
The mean structure plots of Figure 2 below also shows the mean progression of CD4 count during treatment period and helps to explore an appropriate fixed time effects to be included in the model.As observed from the plot the red line plot was the progression of CD4 count and the black line was the progression of CD4 count with loess smoothing techniques.
The average CD4 count progression plot depicts the quadratic change of CD4 count over time with some linear effects whereas the average CD4 count progression with loess smoothing technique plots depicts the linear change of CD4 count with some quadratic effects.Hence, the data is unbalanced longitudinal data mean structure plot with loess smoothing techniques was preferable to determine the mean structure of CD4 count evolution.Accordingly,since the plot with smoothing techniques shows linear and some quadratic change of CD4 count over time linear and quadratic time effects were considered in the model.

Models Comparison
To select an appropriate model as observed in Table 4 different models were fitted with the consideration of different random effects and different combined models were also considered.This model comparison was made to come up with an appropriate model that handles the progression of CD4 count measurement over time with associated factors.Among these candidate models the estimated parameters of Poisson mixed model has smaller estimated values that the parameters PN combined model.But, PM model standard error of estimated parameters were some what larger than that of PN combined model.The AIC value of these two models indicates PM has smaller AIC value than PN combine model showing PM model is preferable than PN combined model in fitting the data but these both models does not considered the over dispersion effect.
Similarly, the comparison of Poisson combined model with gamma(PG) and gamma and normal random effects(PNG) depicts PG model has larger estimated values of the fixed effects coefficient and standard errors than PNG combined model parameters.This PG combined model only considered the over dispersion effects with exclusion of correlation effects which arises due repeated measurements of CD4 count.The estimated AIC value of this PG was also larger than the PNG combined model showing PG combined model was not an appropriate fit of data than PNG.
The AIC and -2log-likelihood comparison of this four models indicated PNG combined model has minimum AIC than the remaining candidate models and considered as an appropriate fit of the data.This result was an indication that PNG combined model was preferable because of this model takes into account correlation due to the repeated measurements of CD4 count and over dispersion which due to the variance of the CD4 count measurement was larger than its mean value simultaneously.Kassahun et al. (2014) and Molenberghs et al. (2010) also used this proposed flexible and unified modeling structure PNG model to simultaneously capture over-dispersion and correlation for a wide range of longitudinal data including count, binary and time-to-event responses.
Finally, to look for the improvement PNG combined model the model was refitted with the consideration of different random effects for the correlation effects due the repeated measurements.Among these refitted PNG candidate models PNG model which was fitted with random intercept having minimum AIC values was considered as an appropriate model than PNG combined models which were refitted with consideration of random time slope and both random time slope and intercept.

Factors Associated with Progression of CD4 Count Measurement
As can be observed from Table 5 linear time have positive effects whereas the quadratic time has negative effects on the progression of CD4 count measurements.The estimated coefficients of for the functional status category group which was negative depicts that ambulatory and bedridden functional status category patients had lower CD4 count measurements in comparison with working functional status category counterparts.Aboma T. and Teshome K. (2016) on their joint modeling of CD4 count and measurement in HIV/TB co-infected patients also revealed that linear time positive effect whereas quadratic time and being at bedridden functional status at baseline has negative effects on the square root of CD4 count measurements.
The negative coefficient for the linear time interaction with functional status of the patients also depicts both ambulatory and bedridden functional status group patients had larger CD4 progression in comparison working functional status category patients.The interaction of linear time with marital status also indicates that others marital status category patients had larger CD4 count progression than single marital status category group patients.
The estimated coefficient for the linear time effect(0.05,p-value=0.0001) depicts that the long of expected CD4 count of the patient increased with 0.05 with unit increment of time in moth whereas for the quadratic time (-0.001,p-value =0.0001) depicts the log of expected CD4 count was decreased by 0.001 with unit increment in quadratic time effect holding other variables constant.Similarly, the estimated coefficients of functional status categories for ambulatory group (-0.33, p-value =0.0006) and bedridden group (-1.13, p-value= 0.0001)depicts that the log of expected CD4 count in ambulatory and bedridden functional status category groups were 0.33 and 1.13 lower than working functional status category group patients respectively holding other variables constant.Furthermore, the linear time interaction with ambulatory functional status coefficient (0.01, p-value=0.0014)depict the log of excepted CD4 count was 0.01 larger than working functional status patients category group during the treatment period holding other variables constant.
The estimated Poisson model coefficients can be also interpreted as the expected incidence rate ratios by exponentiation of the estimated model coefficients.This incidence rate ratio depicts how the expected rate CD4 count increased or decreased with unit increment holding other variables in the model constant if the estimated coefficient was a continuous variable.In the case of categorical variable the estimated incidence rate ratio depicts how the rate of expected CD4 count for the estimated category group was lower or larger in comparison with the reference group counterpart.
Accordingly, the estimated coefficient for the linear time effect (0.05) depicts the expected rate of CD4 count measurement of the patients was increased with rate of 1.05(exp(0.05))with unit increment of time month whereas the estimated coefficient for quadratic time effect (-0.001) depicts the expected of CD4 count was decreased with the rate of 0.10(exp(-0.0011))with unit increment of quadratic time effects holding other variables constant.Similarly, the estimated coefficient for bedridden functional status group (-1.13) depicts the expected rate of CD4 count in bedridden functional status group was 0.32(exp (-1.13)) lower than that of working functional status group holding other variables constant.

Conclusions
The Poisson normal gamma (PNG) combined model which considers simultaneously correlation effects of CD4 count due repeated measurement and over dispersion effect due the variance of the CD4 count measurement was larger than it's mean of the distribution was considered as an appropriate fit of the data.Being at bedridden and ambulatory functional status group at base line in comparison with working functional status group patients at baseline and quadratic time effects were the risk factors that lowers the CD4 count in HIV infected TB patients at the study area.

Table 1 . List of Covariates Considered in the Study
. Stage II indicates mild disease, Stage III indicates advanced disease and Stage IV indicates severe disease.Hence disease severity increases from Stage I to Stage IV.Functional Status of the patients is also categorical covariate with three categories: Working, Ambulatory and Bedridden.Working patients are those patients who can able to work day to day while ambulatory patients are those patients who can able to work some time but bedridden patients cannot able to work due to the infectious disease.The marital status group also have three categories; married, single and others which includes separated, windowed and divorced individuals.
Table2of the baseline characteristics of the patients with baseline and over all average CD4 count during treatment period the average age of the patients were 32.11 years with standard deviation 8.88 years at baseline whereas; the average weight of the patients was 48.45 kilogram with standard deviation 10.78 at baseline.The distribution of the patients with their sex also indicated 134(56.07%)and105(43.93%) of them were male and female respectively whereas larger average CD4 count (184.94)wasobserved in male category group in comparison with females category.The residence of the patients also indicates that 203(84.94%) of them were from urban areas whereas about 36(15.06)of the were from the rural areas of the Jimma town.The educational level distribution of the patients depicts 53(22.18%),104(43.51%),67(28.03%)and15(6.28%) of the patients were non educated, primary, secondary and tertiary educated individuals respectively.The lover average baseline CD4 count(96.733)withstandarddeviation value of 75.029 was observed in tertiary educated patients category group.The baseline functional status of the patients also indicated 120(50.21%),97(40.59%),22(9.21%) of them were at ambulatory, working and bedridden functional status group respectively.Furthermore, the average baseline CD4 count by functional status also depicts lower average CD4 count (74.64) with standard deviation (78.15) was observed in bedridden functional status group in comparison with ambulatory and working functional status category groups.The WHO clinical stages categories which shows the level of severity of the disease of the patient describes that 29(12.14%),115(48.12%)and 95(39.75%)of the patients were at clinical stage I and II,III and IV respectively.The lower average CD4 count (155.58) with standard deviation(118.55)wasobserved in clinical stage III patients in comparison with clinical stage patients at baseline.

Table 2 .
Baseline Demographic and Clinical Characteristics of the Patients

Table 3 .
Mean and Standard Deviation of CD4 Count During Measurement Time

Table 4 .
Comparison of Candidate Models Poisson mixed, PN= Poisson normal, SE= Standard Error PNG = Poisson Normal Gamma, PG= Poisson Gamma, Est.= Estimate

Table 5 .
The Selected PNG Combined Model With Estimated IRR