On Generalized Gamma Distribution and Its Application to Survival Data

The generalized gamma distribution is a continuous probability distribution with three parameters. It is a generalization of the two-parameter gamma distribution. Since many distributions commonly used for parametric models in survival analysis (such as the Exponential distribution , the Weibull distribution and the Gamma distribution) are special cases of the generalized gamma, it is sometimes used to determine which parametric model is appropriate for a given set of data. Generalized gamma distribution is one of the distributions used in frailty modeling. In this study , it is shown that generalized gamma distribution has three sub-families and its application to the analysis of a survival data has also been explored. The parametric modeling approach has been carried out to find the expected results.


Introduction
The early generalization of gamma distribution can be traced back to Amoroso(1925)who discussed a generalized gamma distribution and applied it to fit income rates.Johnson et al. gave a four parameter generalized gamma distribution which reduces to the generalized gamma distribution defined by Stacy (1962) when the location parameter is set to zero.The generalized gamma defined by Stacy (1962) is a three-parameter exponentiated gamma distribution.Mudholkar and Srivastava (1993) introduced the exponentiated method to derive a distribution.Agarwal and Al-Saleh (2001) applied generalized gamma to study hazard rates.Balakrishnan and Peng (2006) applied this distribution to develop generalized gamma frailty model.Nadarajah and Gupta (2007) proposed another type of generalized gamma distribution with application to fitting drought data.Cordeiro et al. (2012) derived another generalization of Stacy's generalized gamma distribution using exponentiated method, and applied it to life time and survival analysis.
The generalized gamma distribution presents a flexible family in the varieties of shapes and hazard functions for modeling duration.Distributions that are used in duration analysis in economics include exponential, log-normal, gamma, and Weibull .The generalized gamma family, which encompasses exponential, gamma, and Weibull as subfamilies, and lognormal as a limiting distribution, has been used in economics by Jaggia, Yamaguchi, and Allenby et al .Some authors have argued that the flexibility of generalized gamma makes it suitable for duration analysis, while others have advocated use of simpler models because of estimation difficulties caused by the complexity of generalized gamma parameter structure.Obviously, there would be no need to endure the costs associated with the application of a complex generalized gamma model if the data do not discriminate between the generalized gamma and members of its subfamilies, or if the fit of a simpler model to the data is as good as that for the complex generalized gamma.Hager and Bain inhibited applications of the generalized gamma model.
Prentice resolved the convergence problem using a nonlinear transformation of generalized gamma model.Maximumlikelihood estimation of the parameters and quasi maximum likelihood estimators for its subfamily (two-parameter gamma distribution) can be found.Hwang, T. et al introduced a new moment estimation of parameters of the generalized gamma distribution using it's characterization.In information theory, thus far a maximum entropy (ME) derivation of generalized gamma is found in Kapur, where it is referred to as generalized Weibull distribution, and the entropy of GG has appeared in the context of flexible families of distributions.Some concepts of this family in information theory has introduced by Dadpay et al .

Log-Normal Distribution
A log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed.If the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution.Likewise, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution.A random variable which is log-normally distributed takes only positive real values.A log-normal process is the statistical realization of the multiplicative product of many independent random variables, each of which is positive.This is justified by considering the central limit theorem in the log domain.The log-normal distribution is the maximum entropy probability distribution for a random variateX for which the mean and variance of ln(X) are specified.
The probability density function of the log-normal distribution is given by: The cumulative distribution is given as which may also be expressed as : where erf is the error function.
Approximating formular for the characteristic function φ(t) can be given as where ω is the Lambert W function.
For a log-normal random variable, the partial expectation is given by The quantile for the log-normal distribution is given by while the variance is given as The skewness is given by the formular The Ex.kurtosis is then denoted by exp(4σ 2 ) + 2exp(3σ 2 ) + 3exp(2σ 2 ) − 6 (9) The Support is over x ∈ (0, +∞) while the mean is exp(µ + σ 2 2 ), the mode is exp(µ − σ 2 ) and the Entropy is given as

Weibull Distribution
The Weibull distribution is a continuous probability distribution.It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Frechet (1927) and first applied by Rosin and Rammler (1933) to describe a particle size distribution.
The Weibull distribution is one of the most widely used lifetime distributions in reliability engineering.It is a versatile distribution that can take on the characteristics of other types of distributions, based on the value of the shape parameter.
The probability density function of a Weibull random variable is where ν > 0 is the shape parameter and ξ > 0 is the scale parameter of the distribution.
The cumulative distribution function for the Weibull distribution is The failure rate h (or hazard function) is given by The quantile (inverse cumulative distribution) function for the Weibull distribution is The moment generating function of the logarithm of a Weibull distributed random variable is given by The mean and variance of a Weibull random variable can be expressed respectively as and The skewness is given by where the mean is denoted by µ and the standard deviation is denoted by σ.The excess kurtosis is given by The information entropy is given by where γ is the EulerMascheroni constant.
The Kullback-Leibler divergence for Weibull distribution is given by The variances and covariances of ν and ξ are estimated from the inverse local Fisher matrix as

Exponential Distribution
The exponential model, with only one unknown parameter, is the simplest of all life distribution models.The exponential distribution is one of the widely used continuous distributions.It is often used to model the time elapsed between events.That is the exponential distribution is often concerned with the amount of time until some specific event occurs.For example, the amount of time (beginning now) until an earthquake occurs has an exponential distribution.Other examples include the length, in minutes, of long distance business telephone calls, and the amount of time, in months, a car battery lasts.An interesting property of the exponential distribution is that it can be viewed as a continuous analogue of the geometric distribution.
The probability density function of exponential distribution is given by The cumulative distribution function is thus denoted by The reliability and failure rate of the exponential distribution are given respectively as while the mean and variance are given by mean = 1/θ and variance = 1/θ 2 .Note that the failure rate reduces to the constant θ for any time.The exponential distribution is the only distribution to have a constant failure rate.Also, another name for the exponential mean is the Mean Time To Failure or MTTF and we have MTTF = 1/θ.
Kullback-Leibler divergence is given by The most important property is that the exponential distribution is memoryless.Specifically, the memoryless property says that an exponentially distributed random variable T obeys the relation: hence the proof.
The memoryless property says that knowledge of what has occurred in the past has no effect on future probabilities.

Gamma Distribution
The gamma distribution is another widely used distribution.Its importance is largely due to its relation to exponential and normal distributions.The gamma distribution is a two-parameter family of continuous probability distributions.The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution.
The gamma distribution is used to model errors in multi-level Poisson regression models, because the combination of the Poisson distribution and a gamma distribution is a negative binomial distribution.The gamma distribution is widely used as a conjugate prior in Bayesian statistics.It is the conjugate prior for the precision (i.e.inverse of the variance) of a normal distribution.It is also the conjugate prior for the exponential distribution.The gamma distribution can be parameterized in terms of a shape parameter φ = κ and an inverse scale parameter β = 1/θ, called a rate parameter.A random variable X that is gamma-distributed with shape φ and rate β is denoted by The corresponding probability density function in the shape-rate parametrization is where Γ(φ) is the gamma function.
Both parameterizations are common because either can be more convenient depending on the situation.
The cumulative distribution function is the regularized gamma function: where γ(φ, βx) is the lower incomplete gamma function.
If φ is a positive integer (i.e., the distribution is an Erlang distribution), the cumulative distribution function has the following series expansion: The KullbackLeibler divergence (KL-divergence), of Γ(φ p , β p ("true" distribution) from Γ(φ q , β q ) ("approximating" distribution) is given by Written using the k, parameterization, the KL-divergence of Γ(k p , θ p ) from Γ(k q , θ q ) is given by The Laplace transform of the gamma probability density function is If X ∼ Γ(φ, θ) and Y ∼ Γ(β, θ) are independently distributed, then X/(X + Y) has a beta distribution with parameters φ and β, and

Generalized Gamma Distribution
The generalized gamma has three parameters given in this case as : κ > 0, ϕ > 0, and β > 0. For non-negative x, the probability density function of the generalized gamma is: where Γ(.) denotes the gamma function.

The Cumulative Distribution Function
The cumulative distribution function (CDF) of a random variable X denoted by F(x), is defined as Using identity for the probability of disjoint events, if X is a discrete random variable,then where x n gives the largest possible value of X that is less than or equal to x.
The cumulative distribution function for a random variable at x gives the probability that the random variable X is less than or equal to that number x.Note that in the formula for CDFs of discrete random variables, we always have n ≤ N, where N is the number of possible outcomes of X.
This function, CDF(x), simply tells us the odds of measuring any value up to and including x.
The cumulative distribution function ("c.d.f.") of a continuous random variable X is defined as: 3.3 Properties of the CDF The cumulative distribution function of generalized gamma distribution is given by: where γ(.) denotes the lower incomplete gamma function.
The GG is a three-parameter (β, σ > θ, k) family whose survival function is given as

Kullback-Leibler Divergence
The Kullback-Leibler divergence (also called relative entropy) is a measure of how one probability distribution diverges from a second, expected probability distribution.Applications include characterizing the relative (Shannon) entropy in information systems, randomness in continuous time-series, and information gain when comparing statistical models of inference.In contrast to variation of information, it is a distribution-wise asymmetric measure and thus does not qualify as a statistical metric of spread.
In the simple case, a Kullback-Leibler divergence of 0 indicates that we can expect similar, if not the same, behavior of two different distributions, while a Kullback-Leibler divergence of 1 indicates that the two distributions behave in such a different manner that the expectation given the first distribution approaches zero.
If f 1 and f 2 are the probability density functions of two generalized gamma distributions, then their Kullback-Leibler divergence is given by;

Moments of Generalized Gamma Distribution
If X has a generalized gamma distribution as above, then The mean of a generalized gamma distribution is given by The mode of a generalized gamma distribution is given by The variance of generalized gamma distribution is given by While the entropy is given by Skewness is given by (50)

Properties of Generalized Gamma Distribution
The generalized gamma family is flexible in that it includes several well-known models as subfamilies.The subfamilies of generalized gamma thus far considered in the literature are exponential (ϕ = β = 1), gamma for β = 1, and Weibull for ϕ = 1 .The log-normal distribution is also obtained as a limiting distribution when ϕ −→ ∞.An important property of generalized gamma family for information analysis is that the family is closed under power transformation.
That is, if It also has the property that Z = ηX has GG(ηα, τ, λ) distribution.
We use the transformed probability density function of the generalized gamma distribution to prove that generalized gamma is flexible with the subfamilies exponential, Weibull and Gamma distributions.
That is we let the pdf of the generalized gamma distribution (GG(α, τ, λ)) to be given by then if α = τ = 1 the subfamily is the exponential distribution

Proof I
From the probability density function of generalized gamma distribution given by if we replace α = τ = 1 , then we have that which is probability density form of an exponential distribution with parameter λ hence the proof.

Proof II
Similarly, if we now let τ = 1 then, from the probability density function of generalized gamma distribution given by: we have that simplifying gives; which is a probability density function of a gamma distribution with the parameters α and λ

Proof III
In the last proof ,if we now let α = 1 then from the probability density function of generalized gamma distribution given by we show that the generalized gamma distribution reduces to Weibull distribution as a sub-family.
That is we have This reduces to which is a probability density function of a Weibull distribution with the parameters λ and τ hence the proof.

Parameter Estimation Using Maximum Likelihood Estimation
Maximum likelihood estimation (MLE) is a method of estimating the parameters of a distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable.The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate.The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
If the likelihood function is differentiable, the derivative test for determining maxima can be applied.In some cases, the first-order conditions of the likelihood function can be solved explicitly; for instance, the ordinary least squares estimator maximizes the likelihood of the linear regression model.Under most circumstances, however, numerical methods will be necessary to find the maximum of the likelihood function.
From a statistical standpoint, the observations y = (y 1 , y 2 , . . ., y n ) are a random sample from an unknown population.
The goal is to make inferences about the population that is most likely to have generated the sample, specifically the probability distribution corresponding to the population.Associated with each probability distribution is a unique vector θ = [θ 1 , θ 2 , . . ., θ k ] T of parameters that index the probability distribution within a parametric family As θ changes in value, different probability distributions are generated.The idea of maximum likelihood is to re-express the joint probability of the sample data f (y 1 , y 2 , . . ., y n ; θ) as a likelihood function L(θ ; y) that treats θ as a variable.For independent and identically distributed random variables, the likelihood function is defined as and evaluated at the observed data sample.The goal is then to find the values of the model parameter that maximize the likelihood function over the parameter space Θ .Intuitively, this selects the parameter values that make the observed data most probable.The problem is thus to find the supremum value of the likelihood function by choice of the parameter L( θ ; y) = sup θ∈Θ L(θ ; y) where the estimator θ = θ (y) is function of the sample.A sufficient but not necessary condition for its existence is for the likelihood function to be continuous over a parameter space Θ that is compact.For an open Θ the likelihood function may increase without ever reaching a supremum value.
In practice, it is often convenient to work with the natural logarithm of the likelihood function, called the log-likelihood: ℓ(θ ; y) = ln L(θ ; y).Since the logarithm is a monotonic function, the maximum of ℓ(θ ; y) occurs at the same value of θ as does the maximum of L. If ℓ(θ ; y) is differentiable in θ, the necessary conditions for the occurrence of a maximum (or a minimum) are known as the likelihood equations.For some models, these equations can be explicitly solved for θ , but in general no closed-form solution to the maximization problem is known or available, and an MLE can only be found via numerical optimization.Another problem is that in finite samples, there may exist multiple roots for the likelihood equations.
Whether the identified root θ of the likelihood equations is indeed a (local) maximum depends on whether the matrix of second-order partial and cross-partial derivatives, known as the Hessian matrix is negative semi-definite at θ , which indicates local concavity.Conveniently, most common probability distributionsin particular the exponential familyare logarithmically concave.
Note that the probability density function using the shape-scale parameterization for gamma distribution is given by Here Γ(k) is the gamma function evaluated at k.The likelihood function for N iid observations (x 1 , ..., x N ) is Finding the maximum with respect to θ by taking the derivative and setting it equal to zero yields the maximum likelihood estimator of the θ parameter: θ = 1 kN ∑ N i=1 x i .Substituting this into the log-likelihood function gives Finding the maximum with respect to k by taking the derivative and setting it equal to zero yields An initial value of k can be found either using the method of moments, or using the approximation Suppose random variable X is exponentially distributed with rate parameter θ, and x 1 , . . ., x n are n independent samples from X, with sample mean x.
The likelihood function for θ, given an independent and identically distributed sample x = (x1, ..., xn) drawn from the variable, is: , where: x i is the sample mean.The derivative of the likelihood function's logarithm is: Consequently, the maximum likelihood estimate for the rate parameter is: Although this is not an unbiased estimator of θ, x is an unbiased MLE estimator of 1/θ = β, where β is the scale parameter defined in the 'Alternative parameterization' section above and the distribution mean.
The bias of θmle is equal to which yields the bias-corrected maximum likelihood estimator The Fisher information, denoted I(θ), for an estimator of the rate parameter θ is given as: Substituting in the distribution and solving gives: This determines the amount of information each independent sample of an exponential distribution carries about the unknown rate parameter θ .

Joint Moments of i.i.d. Exponential Order Statistics
Let X 1 , . . ., X n be n independent and identically distributed exponential random variables with rate parameter θ.Let X (1) , . . ., X (n) denote the corresponding order statistics.For i < j, the joint moment E[X (i) X ( j) ] of the order statistics X (i) and X ( j) is given by This can be seen by invoking the law of total expectation and the memoryless property: The first equation follows from the law of total expectation.The second equation exploits the fact that once we condition on X (i) = x, it must follow that X ( j) ≥ x.The third equation relies on the memoryless property to replace E[X ( j) | X ( j) ≥ x] with E[X ( j) ] + x.

Sum of Two Independent Exponential Random Variables
The sum of two independent variables corresponds to the convolution of probability distributions.If X 1 andX 2 are independent exponential random variables of independent observations, with respective rate parameters θ 1 and θ 2 , then the probability density of Z = X 1 + X 2 is given by, The mean and variance of Z = X 1 + X 2 are found to be The cumulative distribution function of the sum of two independent exponential random variables then follows as, For the Weibull distribution, the maximum likelihood estimator for the ξ parameter given ν is The maximum likelihood estimator for ν is the solution for ν of the following equation This equation defining ν only implicitly, one must generally solve for ν by numerical means. When are the N largest observed samples from a dataset of more than N samples, then the maximum likelihood estimator for the ξ parameter given ν is Also given that condition, the maximum likelihood estimator for ν is Again, this being an implicit function, one must generally solve for ν by numerical means.

Statistical Results and Data Analysis
An application of the distributions discussed above were carried out on the Ovarian Cancer Survival Data in which a randomized trial was performed comparing two treatments for ovarian cancer .The specified distributions were then compared parametrically thus the best choice was based on the minimum value of the AIC results.
There are different methods for selecting the most appropriate model in statistical analysis.The most commonly used methods include information and likelihood based criteria.To compare the different sub-families of a distribution used in the study , the information based criteria is applied .The most commonly used model selection criteria are the Akaike information criterion (AIC) and Bayesian information criterion (BIC).AIC is given by the expression, where L is the maximized likelihood value and k is the number of parameters in the model.
BIC is given by the expression where N is the total sample size.
The model with the smallest AIC value is considered a better fit.

Discussion and Conclusion
Because of its constant failure rate property, the exponential distribution is an excellent model for the long flat "intrinsic failure" portion of the Bathtub Curve.Since most components and systems spend most of their lifetimes in this portion of the Bathtub Curve, this justifies frequent use of the exponential distribution (when early failures or wear out is not a concern).Just as it is often useful to approximate a curve by piecewise straight line segments, we can approximate any failure rate curve by week-by-week or month-by-month constant rates that are the average of the actual changing rate during the respective time durations.That way we can approximate any model by piecewise exponential distribution segments patched together.Some natural phenomena have a constant failure rate (or occurrence rate) property; for example, the arrival rate of cosmic ray alpha particles or Geiger counter tics.The exponential model works well for inter arrival times (while the Poisson distribution describes the total number of events in a given period).When these events trigger failures, the exponential life distribution model will naturally apply.A probability plot of normalized exponential data was generated, so that a perfect exponential fit is a diagonal line with slope 1.The probability plot for 200 normalized

Figure 4 .
Figure 4. Generalized-gamma-graph with AFT and PH

Table 1 .
Table showing results when Generalised Gamma is used as the distribution

Table 2 .
Table showing results when Exponential is used as the distribution

Table 3 .
Table showing results when Weibull is used as the distribution