The Topp-Leone Generated Weibull Distribution : Regression Model , Characterizations and Applications

Gokarna R. Aryal1, Edwin M. Ortega2, G. G. Hamedani3, & Haitham M. Yousof4 1 Department of Mathematics, Statistics and CS, Purdue University Northwest, USA 2 Departamento de Ciências Exatas, Universidade de São Paulo, Piracicaba, SP, Brazil 3 Department of Mathematics, Statistics and Computer Science, Marquette University, USA 4 Department of Statistics, Mathematics and Insurance, Benha University, Egypt

In this paper we introduce a new generalization of the Weibull distribution using the genesis of the Topp-Leone distribution and is named as Topp-Leone Generated Weibull (TLGW) distribution.Consider the Topp-Leone generated family of distributions proposed by Rezaei, Sadr, Alizadeh & Nadarajah (2016) with its probability density function (pdf) and cumulative distribution function (cdf) given by, respectively.In this paper we will use this generalization to the Weibull (W) distribution whose pdf and cdf are given, respectively, by and By inserting (3) and ( 4) into (1), we can write the pdf of the TLGW distribution as The corresponding cdf of TLGW distribution is given by Figure 1 illustrates the graphical behavior of the pdf of TLGW distribution for selected parameter values.As we shall see in the sequel, this is a rather flexible family compared to the Weibull distribution.x

Figure 1. Probability density function of TLGW distribution
In order to derive mathematical and statistical properties of the TLGW distribution, the series expansion of its pdf and cdf will be useful.The cdf in (6) can be expressed as where is the cdf of exponentiated-Weibull (exp-W) distribution with power parameter γ.This means the TLGW distribution can be expressed as a linear mixture of the exp-W distribution.Similarly, pdf (5) can be expressed as Equation ( 8) reveals that the density of X can be expressed as a linear mixture of exp-W densities.So, several mathematical properties of the new family can be obtained from those of the exp-W distribution.
The paper is unfolded as follows.In Section 2, we obtain some mathematical properties including moments, cumulants, generating function, residual, reversed residual life functions, stress-strength model and order of the proposed distribution.
In Section 3, we provide a useful characterization of the new distribution.In Section 4, the model parameters are estimated by maximum likelihood and a simulation study is performed.In Section 5, we present a regression model based on the TLGW distribution with censored data.In Section 6, the usefulness of the new distribution is illustrated by means of four real data sets, where we prove empirically that it outperforms some well-known lifetime distributions.Finally, Section 7 offers some concluding remarks.

Mathematical Properties
In this section we will provide some mathematical properties of the TLGW distribution including the moments, incomplete moments, mean deviations, order statistic etc.

Moments, Cumulants and Generating Function
The rth ordinary moment of X is given by µ where Henceforth, Y [(α+k)θ] denotes the exp-W distribution with power parameter [(α + k) θ] .Setting r = 1 in (9), we have the mean of X.The rth central moment of X, say M r , follows as The skewness and kurtosis measures also can be calculated from the ordinary moments using well-known relationships.The cumulants (κ n ) of X follow recursively from . Clearly, It can be derived using equation ( 9), for r > −β, as ) .
The effect of the parameters α and θ on mean, variance, skewness and kurtosis, for given values of β = 2 and η = 2, are displayed in Figures 2 and 3.

Incomplete Moments and Mean Deviations
The main applications of the first incomplete moment refer to the mean deviations and the Bonferroni and Lorenz curves.These curves are very useful in economics, reliability, demography, insurance and medicine.The sth incomplete moment, say φ s (t), of X for s > −β can be expressed from (8) as The mean deviations about the mean 1 ) is easily calculated from (4) and φ 1 (t) is the first incomplete moment given by ( 10) with s = 1.A general equation for φ 1 (t) can be derived from (10) as

Residual and Reversed Residual Life Functions
The nth moment of the residual life, say where which represents the expected additional life length for a unit which is alive at age t.The MRL of X can be obtained by setting n = 1 in the last equation.The nth moment of the reversed residual life, say Then, the nth moment of the reversed residual life of X becomes where ) t n−r .The mean inactivity time (MIT) or mean waiting time (MWT), also called the mean reversed residual life function, is given by and it represents the waiting time elapsed since the failure of an item on condition that this failure had occurred in (0, t).The MIT of the TLGW distributions can be obtained easily by setting n = 1 in the above equation.

A Stress-Strength Model
Stress-strength model is the most widely approach used for reliability estimation.This model is used in many applications of physics and engineering such as strength failure and system collapse.In stress-strength modeling, R = Pr(X 2 < X 1 ) is a measure of reliability of the system when it is subjected to random stress X 2 and has strength X 1 .The system fails if and only if the applied stress is greater than its strength and the component will function satisfactorily whenever X 1 > X 2 .R can be considered as a measure of system performance and naturally arise in electrical and electronic systems.Other interpretation can be that, the reliability, say R, of the system is the probability that the system is strong enough to overcome the stress imposed on it.Let X 1 and X 2 be two independent random variables have TLGW(α 1 , θ 1 , η, β) and TLGW(α 2 , θ 2 , η, β) distributions .Then, we can write where and Thus, R can be simply expressed as Ω k, j .

Order Statistics
Let X 1 , . . ., X n be a random sample from the TLGW distribution and let X (1) , . . ., X (n) be the corresponding order statistics.The pdf of ith order statistic can be written as where B(•, •) is the beta function.Substituting ( 5) and ( 6) in (11) the pdf of X i:n can be expressed as the pdf of X i:n can be expressed as where and f j+i−1,k can be obtained recursively from Then, the density function of the TLGW order statistics is a mixture of exp-W density.Based on the last equation, we note that the properties of X i:n follow from those of Y r+k .For example, the moments of X i:n can be expressed as, for q > −β, where The L-moments are analogous to the ordinary moments but can be estimated by linear combinations of order statistics.They exist whenever the mean of the distribution exists, even though some higher moments may not exist and are relatively robust to the effects of outliers.Based upon the moments in equation ( 12), we can derive explicit expressions for the Lmoments of X as infinite weighted linear combinations of the means of suitable TLGW order statistics.They are linear functions of expected order statistics defined by

Characterizations
In this section we present certain characterizations of TLGW distribution.These characterizations are based on the ratio of two truncated moments.Due to the nature of this distribution, we believe that these may be the only possible characterizations of TLGW distribution.Our first characterization result employs a theorem due to Glänzel (1987) see Theorem 1 in Appendix A. Note that the result holds also when the interval H is not closed.Moreover, it could also be applied when the cdf F does not have a closed form.As shown in Glänzel (1990), this characterization is stable in the sense of weak convergence.
Proposition 3.1.Let X : Ω → (0, ∞) be a continuous random variable and let q 1 (x) = for x > 0. The random variable X belongs to the family (5) if and only if the function ξ defined in Theorem 1 has the form Proof.Let X be a random variable with pdf (5), then and and finally Conversely, if ξ is given as above, then and hence Now, in view of Theorem 1, X has density (5).
Corollary 3.1.Let X : Ω → (0, ∞) be a continuous random variable and let q 1 (x) be as in Proposition 3.1.Then X has pdf (5) if and only if there exist functions q 2 and ξ defined in Theorem 1 satisfying the differential equation Corollary 3.2.The general solution of the differential equation in Corollary 3.1 is where D is a constant.Note that a set of functions satisfying the differential equation in Corollary 3.1, is given in Proposition 3.1 with D = 0.However, it should be also noted that there are other triplets (q 1 , q 2 , ξ) satisfying the conditions of Theorem 1.

Parameter Estimation
Subsection 4.1 provides procedures for maximum likelihood estimation of the TLGW distribution.Subsection 4.2 assesses the performance of the maximum likelihood estimators (MLEs) in terms of biases and mean squared errors by means of a simulation study.

Parameter Estimation
Several methods for parameter estimation have been proposed in the literature but the maximum likelihood method is the most commonly employed one.The maximum likelihood estimators enjoy desirable properties and can be used for constructing confidence intervals and regions and also in test statistics.The normal approximation for these estimators, in large samples, can be easily handled either analytically or numerically.So, we consider the estimation of the unknown parameters of this family from complete samples only by maximum likelihood.Let x 1 , . . ., x n be a random sample from the TLGW distribution.Let τ = (α, θ, β, η) T be the 4 × 1 parameter vector.To determine the MLE of τ, we use the log-likelihood function (ℓ) of TLGW distribution given by where . Let z i = ∂s i ∂β = (ηx i ) β e −(ηx i ) β ln (ηx i ) and q i = βη β−1 x β e −(ηx) β .Then the components of the score vector are given by Now, setting the nonlinear system of equations ∂ℓ(τ) ∂α = 0, ∂ℓ(τ) ∂θ = 0, ∂ℓ(τ) ∂β = 0 and ∂ℓ(τ) ∂η = 0 and solving them simultaneously yields the MLE τ = ( α, θ, β, η) T .To solve these equations, it is usually more convenient to use nonlinear optimization methods such as the quasi-Newton algorithm to numerically maximize ℓ.For interval estimation of the parameters, we obtain the 4 × 4 observed information matrix J(τ) = { ∂ 2 ℓ ∂r ∂s } (for r, s = α, θ, β, η), whose elements can be computed numerically.Under standard regularity conditions when n → ∞, the distribution of τ can be approximated by a multivariate normal N 4 (0, J( τ) −1 ) distribution to construct approximate confidence intervals for the parameters.Here, J( τ) is the total observed information matrix evaluated at τ.

Simulation Study
In this section, we present some simulations for different sample sizes to assess the accuracy of the MLEs.Simulating random variables from well defined probability distributions has been discussed in the computational statistics literature, e.g; the inverse transformation method, the rejection and acceptance sampling technique, etc.An ideal technique for simulating from the TLGW distribution is the inversion method.We can simulate random variable X by where U is a uniform random number in (0, 1).For selected combinations of θ, α, β and η we generate samples of sizes n = 50, 100, 200, 300, 500 and 1000 from the TLGW distribution.We repeat the simulations N = 1, 000 times and evaluate the mean estimates and the root mean squared errors (RMSEs).The empirical results obtained using the statistical computing software R are given in Tables 1 and 2. It can be observed that as sample size increases the mean squared error decreases.Therefore, the maximum likelihood method works very well to estimate the parameters of TLGW distribution.
We refer to equation (13) as the (new) LTLGW distribution, say X ∼ TLGW(α, θ, σ, µ), where µ is a location parameter,σ is a dispersion parameter and α and θ are shape parameters.Thus, The plots of (13) in Figure 5 for selected parameter values show great flexibility of the density function in terms of the parameters α and θ.
The survival function corresponding to (13) becomes We define the standardized random variable Z = (Y − µ)/σ with density function In many practical applications, the lifetimes are affected by explanatory variables such as the cholesterol level, blood pressure, weight and many others.Parametric regression models to estimate univariate survival functions for censored data are widely used.A parametric model that provides a good fit to lifetime data tends to yield more precise estimates of the quantities of interest.Based on the LTLGW density function, we propose a linear location-scale regression model for censored data linking the response variable y i and the explanatory vector v T i = (v i1 , . . ., v ip ) as follows where the random error z i has density function ( 15), γ = (γ 1 , . . ., γ p ) T , σ > 0, a > 0 and b > 0 are unknown parameters.
The parameter µ i = v T i γ is the location of y i .The location parameter vector µ = (µ 1 , . . ., µ n ) T is given by a linear model µ = V γ, where V = (v 1 , . . ., v n ) T is a known model matrix.The LTLGW regression model ( 16) opens new possibilities for fitting many different types of censored data.It is an extension of an accelerated failure time model using the TLGW distribution for censored data.
Consider a sample (y 1 , v 1 ), . . ., (y n , v n ) of n independent observations, where each random response is defined by y i = min{log(x i ), log(c i )}.We assume non-informative censoring such that the observed lifetimes and censoring times are independent.Let F and C be the sets of individuals for which y i is the log-lifetime or log-censoring, respectively.Conventional likelihood estimation techniques can be applied here.The log-likelihood function for the vector of parameters τ = (α, θ, σ, γ T ) T from model ( 16) has the form where is the density (13) and S (y i |v i ) is the survival function ( 14) of Y i .The total log-likelihood function for τ reduces to where z i = (y i − v T i γ)/σ and r is the number of uncensored observations (failures).The score functions for the parameters a, b, β, σ and γ are given by U α (τ), U θ (τ), U σ (τ) and U a (γ j ) for j = 1, . . ., p.

Application 2: Regression Model With Censored Data
For an application of the LTLGW regression model, we consider the data from a two-arm clinical trial discussed earlier by Efron (1988).Efron (1988) and Mudholkar et al. (1996) observed that the empirical hazard functions for both samples start near zero, suggesting an initial high-risk period in the beginning, a decline for a while and then stabilization after about one year.Specifically, Efron's data from a head and neck cancer clinical trial consist of survival times of 51 patients in arm A (17.6% censored data) who were given radiation therapy and 45 patients in arm B (31.1% censored data) who were given radiation plus chemotherapy.Mudholkar et al.(1996) analyzed these data separately using the exponentiated Weibull distribution.Here, we use the LTLGW regression model.
Let x i be the survival time (in days) for the ith observation and v i1 be the two-Arm clinical trial (0=Arm A, 1=Arm B) (for other details, see Mudholkar et al. (1996).We propose the model where the random variable y i = log(x i ) follows the LTLGW distribution (13) for i = 1, 2, . . ., 96.
An alternative approach for modeling these data can be provided by the log-Weibull (LW) distribution.There are various extensions of this lifetime distribution; see, for example, the log-beta Weibull (LBW) (Ortega, et al. (2011)) distribution.
The MLEs of the model parameters are computed using the procedure NLMixed in SAS.As initial values for γ and σ in the iterative algorithm for maximizing the log-likelihood function (17), we adopt the fitted values obtained by fitting the LW regression model.The MLEs of the parameters and the AIC, CAIC and BIC statistics for some models are listed in Table 6.2.These results from Table 6.2 indicate that the LTLGW model has the lowest AIC, CAIC and BIC values among the fitted models, so the LTLGW model provides an appropriate fit for these data.
Further, the fitted LTLGW regression model suggests that x 1 is significant at 6% and that there is a significant difference between the two-Arm clinical trial.We plot in Figure 6.2 the empirical survival function and the estimated survival functions for the LTLGW, LBW and LW models.These plots suggest that the LTLGW model provides a suitable fit.

Figure 3 .
Figure 2. Plots of mean and variance of TLGW distribution

Figure 5 .
Figure 5. Fitted pdf and QQ plots of TLGW and ALFW distributions for the Kiama Blowhole data

Figure 6 .
Figure 6.Fitted pdf and QQ plots of TLGW and ALFW distributions for the repair time data Another interesting function is the mean residual life (MRL) function or the life expectation at age t defined by m 1

Table 1 .
Empirical means and the RMSEs of TLGW distribution

Table 2 .
Empirical means and the RMSEs of TLGW distribution

Table 4 .
The l, AIC, A * , W * and K-S Statistics-Kiama Blowhole data This data set includes an active repair time (in hours) for an airborne communication transceiver reported by Balakrishnan,

Table 6 .
The l, AIC, A * , W * and K-S Statistic-repair

Table 7 .
MLEs of the parameters from the LTLGW regression model fitted to the Efron data, the corresponding SEs (given in parentheses), p-values in[.]and the basic statistics.