The Odd Power Lindley Generator of Probability Distributions: Properties, Characterizations and Regression Modeling

In this study, a new flexible family of distributions is proposed with its statistical properties as well as some useful characterizations. The maximum likelihood method is used to estimate the unknown model parameters by means of two simulation studies. A new regression model is proposed based on a special member of the proposed family called, the log odd power Lindley Weibull distribution. Residual analysis is conducted to evaluate the model assumptions. Four applications to real data sets are given to demonstrate the usefulness of the proposed model.


Introduction
As of late, there has been an extraordinary enthusiasm for introducing more flexible distributions through extending the classical distributions by incorporating additional shape parameters to the baseline model.Many generalized families of distributions have been proposed and studied over the last two decades for modeling data in many applied areas.So, several classes of distributions have been constructed by extending common families of continuous distributions.These generalized distributions give more flexibility via adding one (or more) shape parameters to the baseline model.They were pioneered by Gupta et al. (1998) who proposed the exponentiated-G (Exp-G) class, which consists of raising the cumulative distribution function (cdf) to a positive power parameter.Many other classes can be cited such as the T-X family by Alzaatreh et al. (2013), the Lomax-G by Cordeiro et al. (2014), Burr X generator by Yousof et al. (2016), the generalized two-sided class of distributions by Korkmaz and Genc ¸(2017), the Burr XII generator by Cordeiro et al. (2018), among others.
Recently, Ghitany et al. (2013) introduced the power Lindley (PL) distribution with the following cdf and probability density function (pdf) and respectively, where α > 0 is a shape parameter and β > 0 is a scale parameter.Using the T-X idea, we define the new family by taking W ) and r(t) = f PL (t; α, β), where G ( x; ψ is the baseline cdf depending on the vector parameter ψ.The cdf of the new family is given by The pdf corresponding to (3) is given by We call this new family the odd power Lindley-G family and denote it by OPL − G(α, β, ψ).For α = 1, the OPL − G family is reduced to the OL − G family which was introduced by Silva et al. (2017).
Let T be a PL random variable in (1) and ( 2).The OPL-G random variable having cdf (3) can be derived as follows Hence, the random variable is the inverse of the baseline cdf.
We can also motivate the OPL-G family with the mixture family structure as follows.Let F 1 ( x; α, β, ψ ) be the cdf of the odd Weibull-G family (OW-G) (Bourguignon et al., 2014) and F 2 ( x; α, β, ψ ) be the cdf of a generalized gamma-G family (GG-G).We note that the cdfs of the OW-G and GG-G families are given by F 1 ( x; α, β, ψ respectively.Then, the OPL-G family can be expressed as where p = β/ (β + 1) is the mixing proportion.Hence, we can say that the OPL-G family is a mixture family.
The rest of the paper is organized as follows.Useful expansions for pdf and cdf of OPL-G family are presented in Section 2. Some of its special cases are taken up in Section 3. In Section 4, we derive some of its mathematical properties.Section 5 deals with some characterizations of the new family.Section 6 offers the maximum likelihood estimation method.Two Monte Carlo simulation studies performed in Section 7. A new log-location regression model as well as residual analysis are presented in Section 8. Section 9 is devoted to applications to real data sets to illustrate empirically the importance of the new model.Finally, some conclusions and future work are given in Section 10.

Useful Expansions for Density of OPL-G Family
By Expanding the quantity A in power series, the OPL-G pdf in (4) can be expressed as Using the generalized binomial expansion for the quantities B 1 and B 2 , we can write where and π δ (x) = δg(x,ψ) G(x,ψ) δ−1 is the pdf of the Exp-G distribution with power parameter δ.The corresponding OPL-G cdf can be written as where Π δ (x) = G(x,ψ) δ is the cdf of Exp-G with power parameter δ.Equation ( 7) and ( 8) reveal that pdf of OPL-G is a linear combination of Exp-G densities.Thereby, some properties of the proposed family such as moments and generating function can be determined by means of Exp-G distribution.The properties of Exp-G distributions have been studied by many authors in recent years, see Shirke and Kakade (2006) for exponentiated log-normal and Nadarajah and Gupta (2007) for exponentiated gamma distributions, among others.

Some Members of the OPL-G Family
The OPL-G family can produce great flexible models by using any baseline models.Here, we present three special cases of this family since they extend some widely-known distributions.We give pdf and cdf of the new distributions.The hrfs of these distributions can be obtained from Equation (5).

The OPL-Normal (OPL-N) Distribution
The normal distribution, symmetrical unimodal pdf shaped and increasing hrf shaped, is well-known in statistics and other areas.So, the OPL-N distribution is defined from ( 3) and ( 4) by taking G(x) and g(x) to be the cdf and pdf of the normal distribution.Its pdf and cdf are given by respectively, where Θ = (α, β, µ, σ) T is the parameter vector, µ ∈ R, α, β, σ > 0, ϕ (•) and Φ (•) are pdf and cdf of the standard normal distribution respectively.We denote it by OPL − N (Θ).For α = 1, the OL-normal (OLN) distribution is obtained.Plots of the OPL-N density and hazard functions for selected parameter values are displayed in Figure 1.From this Figure, we see that we obtain bi-modal shaped, firstly uni-modal shaped and then increasing shaped, left skewed and right skewed distributions.Also, the plots indicate that the hrf of the OPL-N distribution is increasing and then bathtube shaped.

The OPL-Weibull (OPL-W) Distribution
The Weibull cdf with the shape γ > 0 and the scale parameters . The pdf and cdf of a random variable X with OPL-W distribution, say X ∼ OPL − W (Θ) are, respectively, given by and where Θ = (α, β, θ, γ) T is the parameter vector and α, β, θ, γ > 0. For α = 1, the OL-Weibull (OLW) distribution (Silva et al., 2017) is obtained.Plots of the OPL-W density and hazard functions for selected parameter values are displayed in Figure 2. From this Figure, we can say that the OPL-W distribution has very flexible pdf shapes such as uni-modal, decreasing, U-shaped,firstly U-shaped and then decreasing shaped.Also, its hrf can be increasing, decreasing and bathtube shaped.

The OPL-Gamma (OPL-Ga) Distribution
Consider the gamma distribution with the shape parameter γ > 0 and the scale parameter θ > 0, where the pdf and cdf (for dt is the upper incomplete gamma function and Γ (•) is complete gamma function.The pdf and cdf of OPL-Ga are given by and respectively, where Θ = (α, β, θ, λ) T is the parameter vector and α, β, θ, λ > 0. We denote it by OPL − Ga (Θ).For α = 1, the OL-gamma (OLGa) distribution is obtained.Plots of the OPL-Ga density and hazard functions for selected parameter values are displayed in Figure 3. From this Figure, we observe decreasing, uni-modal shaped, U-shaped and firstly Ushaped and then decreasing shaped distributions.Also, the plots point out that the OPL-Ga distribution has decreasing, increasing, bathtube shaped.

Moments, Incomplete Moments and Generating Function
The r th ordinary moment of X is given by µ Henceforth, Y δ denotes the Exp-G model with power parameter δ.For δ > 0, we have E which can be computed numerically in terms of the baseline quantile function 9), we have the mean of X.The last integration can be computed numerically for most parent distributions.The skewness and kurtosis measures can be calculated from the ordinary moments using well-known relationships.The n th central moment of X, say M n , is The r th incomplete moment, say I r (t), of X can be expressed from (6) as The mean deviations about the mean 1 ) is easily calculated from (3) and I 1 (t) is the first incomplete moment given by ( 10) with r = 1.A general equation for I 1 (t) can be derived from (10) as where of X can be derived using equation ( 9) as ] , where M δ (t) is the mgf of Y δ .Hence, M X (t) can be determined from the Exp-G generating function.

Quantile Function (qf) and Random Number Generation
The OPL-G family can easily be simulated by inverting (3) as follows: if U ∼ U(0, 1), then the random variable X U can be obtained from the baseline qf, say Q G (u) = G −1 (u).In fact, the random variable is the inverse of the baseline cdf and W −1 (•) denotes the negative branch of the Lambert W function.X U can be used as a random number generator for OPL-G distribution.Also, we can obtain random number from OPL-G by using mixture structure in (6).We can give this procedure with the following an algorithm.
Algorithm(Mixture Form) By using packet programme, the random variates from the OLP-G distribution can be generated by the transformation method.For example, we first generate a random variate Y from the PL distribution by using the rplindley function in the LindleyR package in R program, then set ) .

Characterizations
This section deals with various characterizations of OPL-G distribution.These characterizations are presented in two directions: (i) based on a simple relationship between two truncated moments and (ii) in terms of the hazard function.It should be pointed out that due to the nature of the OPL-G distribution our characterizations may be the only possible ones for this distribution.
We present our characterizations (i) and (ii) in two subsections.

Characterizations Based on Truncated Moments
We employ a theorem due to Glänzel (1987), see Theorem 1 of Appendix A .The result, however, holds also when the interval H is not closed since the condition of Theorem 1 is on the interior of H.We like to mention that this kind of characterization based on a truncated moment is stable in the sense of weak convergence (see, Glänzel 1990).
Proposition 5.1.Let X : Ω → R be a continuous random variable and let q 1 (x) = The random variable X belongs to the family (4) if and only if the function η defined in Theorem1 has the form Proof.Let X be a random variable with pdf (4), then Conversely, if η is given as above, then and hence Now, according to Theorem 1, X has density (4) .
Corollary 5.1.Let X : Ω → R be a continuous random variable and let q 1 be as in Proposition 5.1.Then, X has pdf (4) if and only if there exist functions q 2 and η defined in Theorem 1 satisfying the differential equation The general solution of the differential equation in Corollary 5.1 is where D is a constant.Note that a set of functions satisfying the above differential equation is given in Proposition 5.1 with D = 0. Clearly, there exist other triplet of functions (q 1 , q 2 , η) satisfying the conditions of Theorem 1.

Characterization Based on Hazard Function
It is known that the hazard function, h F , of a twice differentiable distribution function, F, satisfies the first order differential equation For many univariate continuous distributions, this is the only characterization available in terms of the hazard function.
The following characterization establish a non-trivial characterization of OPL-G distribution, for α = 1, in terms of the hazard function, which is not of the above trivial form.
Proposition 5.2.Let X : Ω → R be a continuous random variable.For α = 1,the pdf of X is (4) if and only if its hazard function h F (x) satisfies the differential equation Proof.If X has pdf (4), then clearly the above differential equation holds.Now, if the differential equation holds, then which is the hazard function of the OPL-G distribution for α = 1.
Remark 5.1.It is easy to see that

Estimation and Inference
Here, we consider estimation of the unknown parameters of the OPL-G distribution by the maximum likelihood method.Let x 1 , . . ., x n be a random sample from the OPL-G distribution with a (q + 2) × 1 parameter vector Ψ =(α, β, ψ) , where ψ is a q × 1 baseline parameter vector.The log-likelihood function for Ψ is given by where . The components of the score vector, , are given as and (for r = 1, ..., q) where Setting the nonlinear system of equations U α = U β = U ψ r = 0 (for r = 1, . . ., q) and solving them simultaneously yields the MLEs Ψ = ( α, β, ψ ) .To solve these equations, it is more convenient to use nonlinear optimization methods such as the quasi-Newton algorithm to numerically maximize ℓ(Ψ).For interval estimation of the parameters, we can evaluate numerically the elements of the (q + 2) × (q + 2) observed information matrix J(Ψ) = {− ∂ 2 ℓ ∂θ r θ s }.Under standard regularity conditions when n → ∞, the distribution of Ψ can be approximated by a multivariate normal N p (0, J( Ψ) −1 ) distribution to construct approximate confidence intervals for the parameters.Here, J( Ψ) is the total observed information matrix evaluated at Ψ.The method of the re-sampling bootstrap can be used for correcting the biases of the MLEs of the model parameters.Good interval estimates may also be obtained using the bootstrap percentile method.
We can compute the maximum values of the unrestricted and restricted log-likelihoods to obtain likelihood ratio (LR) statistics for testing sub-model of the OPL-G distribution.Hypothesis tests of the type H 0 : ω = ω 0 versus H 1 : ω ω 0 , where ω is a vector formed with some components of Ψ and ω 0 is a specified vector, can be performed using LR statistics.For example, the test of H 0 : α = 1 versus H 1 : H 0 is not true is equivalent to comparing the OPL-G and PL-G distributions and the LR statistic is given by w = 2{ℓ( α, β, ψ) − ℓ(1, β, ψ)}, where α, β and ψ are the MLEs under H and ψ is the estimate under H 0 .
We can compute the maximum values of the unrestricted and restricted log-likelihoods to obtain likelihood ratio (LR) statistics for testing some sub-models of the OPL-G distribution.
Let (y 1 , v 1 ), . . ., (y n , v n ) be a sample of n independent observations, where each random response is defined by y i = min{log(x i ), log(c i )}.Let F and C be the sets of individuals for which y i is the log-lifetime or log-censoring, respectively.The log-likelihood function for the vector of parameters τ τ τ = (α, β, σ, β ⊤ ) ⊤ from model ( 14) has the form l(τ is the density (11) and S (y i ) is the survival function ( 12) of Y i .Then, the total log-likelihood function for τ τ τ is given by where /σ and r is the number of uncensored observations and c is the number of censored observations.The MLE τ τ τ of the vector of unknown parameters can be evaluated by maximizing the log-likelihood (15).The optim fuction of R software is used to minimize the negative log-likelihood function, given in ( 15).The asymptotic distribution of ( τ τ τ − τ τ τ) is multivariate normal N p+2 (0, K(τ τ τ) −1 ), where K(τ τ τ) is the expected information matrix.The asymptotic covariance matrix K(τ τ τ) −1 of τ τ τ can be approximated by the inverse of the (p + 2) × (p + 2) observed information matrix − Ł(τ τ τ), whose elements can be evaluated numerically.The approximate multivariate normal distribution N p+2 (0, − Ł(τ τ τ) −1 ) for τ τ τ can be used to construct approximate confidence intervals for the parameters of τ τ τ.

Simulation Studies
In this Section, we perform two simulation studies by using the OPL-W and OPL-N distributions to illustrate the performance of MLEs for the parameters of these distribution.The random numbers generation is obtained with rplindley function in the LindleyR package in R program.This method is given by qf subsection.MLEs results were obtained using optim-CG routine in the R programme.
In the first simulation study, we obtain the graphical results and generate N = 1000 samples of size n = 20, 30, . . ., 1000 from OPL-W distribution with true parameters values α = 0.5, β = 10, θ = 1 and γ = 2.In this simulation study, we calculate the empirical mean, standard deviations (sd), bias and mean square error (MSE) of the MLEs.The bias and MSE are calculated by give results of this simulation study in Figure 5.
In the second simulation study, we generate 1, 000 samples of sizes 20,50 and 100 from selected OPL-N distributions.
For this simulation study, we obtain the empirical means and sd's of the parameters.The results of this simulation study are reported in Table 1.
From Figure 5 and Table 1, we observe that when the sample size increases, the empirical means approach the true parameter value for both distributions.For the same case, the standard deviations, biases and MSEs decrease in all the cases.The above results are as expected.

Real Data Applications
In this section, we present three applications to real data sets to illustrate the usefulness of the OPL-N, OPL-W and OPL-Ga distributions.We compare these distribution model with the odd Burr normal (OBN) (Alizadeh et al., 2017), power normal (PN) (Gupta and Gupta, 2008), OLN, normal (N), Lomax-Weibull (LxW) (Cordeiro et al., 2014), OLW, Weibull (W), exponentiated generalized gamma (EGGa) (Cordeiro et al., 2011), OLGa, gamma (Ga) and PL models.The pdfs of the OBN, PN, LxW and EGGa are given by To determine the best model, we also computed the estimated log-likelihood values l, Akaike Information Criteria (AIC), corrected Akaike information criterion (CAIC), Bayesian information criterion (BIC), Hannan Quinn information criterion (HQIC), Kolmogorov-Smirnov (K-S), Cramer von Mises (W * ) and Anderson-Darling (A * ) goodness of-fit statistics for all distribution models.We note that the statistics W * and A * are described in detail in Chen and Balakrishnan (1995).
In general, it can be chosen as the best model which has the smaller the values of the AIC, CAIC, BIC, HQIC, K-S, W * and A * statistics and the larger the values of l and p-values.All computations are performed by the maxLike routine in the R programme.The details are the followings.
The second data, studied by Meeker and Escobar (1998, p. 383), gives the times of failure and running times for a sample of devices from a field-tracking study of a larger system.The data are: 275,13,147,23,181,30,65,10,300,173,106,300,300,212,300,300,300,2,261,293,88,247,28,143,300,23,300,80,245,266.This data also analyzed by Cordeiro et al. (2010) and Alexander et al. (2012).
The results are reported in Table 2 and 3.These Tables clearly show that the distribution models of the OPL-G family model have the smallest values of the AIC, CAIC, BIC, HQIC, K-S, W * and A * statistics and have the largest values of the l and all p-values among the fitted models.Then, OPL-N, OPL-W and OPL-Ga models could be chosen as the best models for the three data sets under the above criteria.
The histograms of the all data sets and the plots of the fitted pdfs and cdfs for all models are shown in Figures 6-8.From these Figures, we see that the OPL-N model fits to first data set as bi-modal shaped.For other data sets, the OPL-W and OPL-Ga models fit to the histograms of the data sets with a more adequate fitting than other models.
The results of LR statistics are shown in Table 4 for three data sets.We can say that the additional parameter of the all OPL-G distribution is essential because we reject the null hypotheses of all three LR tests in favour of the OPL-N, OPL-W and OPL-Ga distributions.Hence, these models provide a better representation of the data sets than the OLN, OLW and OLGa models based on the LR test at the 5% significance level.The regression model fitted to the voltage data set is given by respectively, where the random variable y i follows the LOPLW distribution given in ( 11).In this paper, we proposed a new flexible class of distributions and provided a comprehensive treatment its mathematical properties as well as some useful characterizations.The maximum likelihood method is used to estimate the model parameters, we assess the performance of the maximum likelihood estimators by means of two simulation studies.Also we introduce a new regression model based on a special member of the new family called the log odd power Lindley Weibull distribution.We show that the new log location-scale regression model for lifetime data can be very useful in analysing real data and provide more realistic fits than other regression models.Index plot of the modified deviance residual and Q-Q plot for modified deviance residual are presented to illustrate that our new model is more appropriate to Stanford heart transplant data set than other competitive models like log-Weibull and log-Topp-Leone odd log-logistic-Weibull model.We hope that the results given in this paper will be useful for practitioners and researchers.
Appendix A Theorem 1.Let Ω, F , P be a given probability space and let H = [a, b] be an interval for some d < b a = −∞, b = ∞ might as well be allowed.Let X : Ω → H be a continuous random variable with the distribution function F and let q 1 and q 2 be two real functions defined on H such that is defined with some real function η.Assume that q 1 , q 2 ∈ C 1 (H), η ∈ C 2 (H) and F is twice continuously differentiable and strictly monotone function on the set H. Finally, assume that the equation ηq 1 = q 2 has no real solution in the interior of H. Then F is uniquely determined by the functions q 1 , q 2 and η , particularly η ′ (u) η (u) q 1 (u) − q 2 (u) exp (−s (u)) du , where the function s is a solution of the differential equation s ′ = η ′ q 1 ηq 1 −q 2 and C is the normalization constant, such that ∫ H dF = 1.We like to mention that this kind of characterization based on the ratio of truncated moments is stable in the sense of weak convergence (see, Glänzel [2]), in particular, let us assume that there is a sequence {X n } of random variables with distribution functions {F n } such that the functions q 1n , q 2n and η n (n ∈ N) satisfy the conditions of Theorem 1 and let q 1n → q 1 , q 2n → q 2 for some continuously differentiable real functions q 1 and q 2 .Let, finally, X be a random variable with distribution F .Under the condition that q 1n (X) and q 2n (X) are uniformly integrable and the family {F n } is relatively compact, the sequence X n converges to X in distribution if and only if η n converges to η , where This stability theorem makes sure that the convergence of distribution functions is reflected by corresponding convergence of the functions q 1 , q 2 and η , respectively.It guarantees, for instance, the 'convergence' of characterization of the Wald distribution to that of the Lé vy-Smirnov distribution if α → ∞.
A further consequence of the stability property of Theorem 1 is the application of this theorem to special tasks in statistical practice such as the estimation of the parameters of discrete distributions.For such purpose, the functions q 1 , q 2 and, specially, η should be as simple as possible.Since the function triplet is not uniquely determined it is often possible to choose ξη as a linear function.Therefore, it is worth analyzing some special cases which helps to find new characterizations reflecting the relationship between individual continuous univariate distributions and appropriate in other areas of statistics.

Figure 1 .
Figure 1.The pdf and hrf of the OPL-N distribution for selected parameter values

Figure 2 .
Figure 2. The pdf and hrf of the OPL-W distribution for selected parameter values

Figure 3 .
Figure 3.The pdf and hrf of the OPL-Ga distribution for selected parameter values Figure 5 displays density plots of LOPLW distribution for some parameter values.As seen from Figure 4, LOPLW distribution can be very flexible for modeling left skewed data.

Figure 4 .
Figure 4. Plots of the LOPLW density for selected parameter values

Figure 5 .
Figure 5. Simulation results of the special OPL-W distribution

Figure 6 .Figure 7 .Figure 8 .
Figure 6.The fitted pdfs and cdfs for the first data sets Figure 9. (a) Index plot of the modified deviance residual and (b) Q-Q plot for modified deviance residual

Table 1 .
Empirical means and standard deviations (in parentheses) for the special OPL-N distributions

Table 2 .
MLEs, standard erros of the estimates (in parentheses), estimated log-likelihood values

Table 3 .
Information criteria results, A * and W * statistics ([•] and {•} denote their p-values) Brito et al. (2017) introduced the Log-Topp-Leone odd log-logistic-Weibull (Log-TLOLL-W) regression model.Brito et al. (2017) used the Stanford heart transplant data set to prove the usefulness of Log-TLOLL-W regression model.Here, we use the same data set to demonstrate the flexibility of LOPLW regression model against to Log-TLOLL-W regression model.These data set is available in p3state.msmpackage of R software.The sample size is n = 103, the percentage of censored observations is 27%.The aim of this study is to relate the survival times (t) of patients with the following explanatory variables: x 1 -year of acceptance to the program; x 2 -age of patient (in years); x 3 -previous surgery status (1 = yes, 0 = no); x 4 -transplant indicator (1 = yes, 0 = no); c i -censoring indicator (0 =censoring, 1 =lifetime observed).