Generalized Exponentiated Weibull Linear Model in the Presence of Covariates

Tiago V. F. Santana1, Edwin M. M. Ortega2, Gauss M. Cordeiro3 & Adriano K. Suzuki4 1 Department of Statistics, Londrina State University, Londrina, Brazil 2 Department of Mathematical Sciences, University of São Paulo, Piracicaba, Brazil 3 Department of Statistics, University Federal of Pernambuco, Pernambuco, Brazil 4 Department of Applied Mathematics and Statistics, University of São Paulo, São Carlos, Brazil


Introduction
The Weibull distribution has been widely used in biomedical and industrial studies and to analyze lifetime data in several areas.According to (Collet, 2003), it is just as important for parametric analysis of survival data as the Normal distribution is in linear models.However, the Weibull distribution does not produce good results in situations that present a non-monotone hazard rate function (hrf), such as the unimodal and U-shaped (or bathtub) functions, which are common in studies of reliability, biology and engineering.Due to this problem, (Mudholkar & Srivastava, 1993) proposed the exponentiated Weibull (EW) model, whose hrf can be either monotone or non-monotone.Sub-models of the EW distribution are the Weibull, Exponential, Rayleigh, Burr type X and Exponentiated-Exponential (EE) distributions, among others.
The EW distribution has been widely applied in many fields of knowledge, both practical and theoretical.Researchers in the computer sciences, clinical drug trials, hydrology, silviculture, statistics, actuary, reliability, ecology and mechanical engineering, have employed this distribution successfully.As an example, we can mention the works of (Mudholkar & Hutson, 1996), who modeled flood data on the Floyd river in the American state of Iowa, (Ahmad, Islam & Salam, 2006) on accelerated lifetime tests, (Surles & D'Ambrosio, 2004) in applications to carbon fiber compounds, (Wang & Rennolls, 2005) to model the diameter of trees in Chinese pine plantations, (Zhang, Xie & Tang, 2005) in a lifetime study of firmware systems, and (Barghout, 2009), who proposed a new order-based reliability prediction model for failure detection, among many others.More recently, (Nadarajah, Cordeiro & Ortega, 2013) presented an extensive review of the EW distribution, with details on its development, several recent results and main applications.(Prudente & Cordeiro, 2010) proposed a class of regression models following the same concepts of generalized linear models (GLMs), but assuming the Weibull distribution for the response variable.The generalized Weibull linear model (GWLM), as called by the authors, allows a new use of the Weibull distribution in a structure very similar to that of GLMs.The proposed model was defined as a synthesis of three important GLM structures: the random component for the response variable, following a suitably parameterized Weibull distribution; a systematic component specified by a linear function to model a relevant part of the means distribution, called the linear predictor; and a link function to connect The paper is organized as follows.In Section 2, we define the GEWLM and examine some of its properties.In Sections 3, we discuss the new GEWLM regression model.In Section 4, we obtain the MLEs and the estimates based on a Bayesian method and provide some results from simulation studies for the GEWLM regression model.The sensitivity analysis based in local influence and Bayesian case influence diagnostics is developed in Section 5.In Section 6, we define the quantile residuals for the new model and study the method to construct a simulation envelope to assess its goodness of fit to a dataset.In Section 7, we fit the new model to a real dataset in the insurance area to demonstrate the performance of the GEWLM in contrast to the GWLM.Finally, in Section 8, we provide some conclusions and suggestions for future research.

The Generalized Exponentiated Weibull Linear Model
The GWLMs are very effective, as stated by the authors, to model a part of the mean response in terms of a regression equation for positive continuous data and to define the residuals.Many other diagnostic measures can closely follow the theory of GLMs.
Let Y 1 , ..., Y n be n independent random variables, where each Y i has the Weibull distribution with shape parameter ϕ and scale parameter α i , which varies according to the observations, with cumulative distribution function (cdf) and probability density function (pdf) given by and respectively.The mean and variance are where Γ j (ϕ) = Γ(1 + j ϕ ) for j ≥ 1, Γ(p) = ∫ ∞ 0 t p−1 e −t dt is the gamma function and Γ ′ (p) = ∂Γ(p)/∂p.By re-parameterizing equation (1) according to the relation ] , proposed by (Cox & Reid, 1987), we obtain the orthogonality between the parameters ϕ and λ i , i.e., E(∂ 2 ℓ(θ)/∂ϕ∂λ i ) = E(∂ 2 ℓ(θ)/∂λ i ∂ϕ) = 0.The advantage of having orthogonal parameters is the asymptotic independence of the MLEs of ϕ and λ i and the simplification of the algebraic developments, since the expected information matrix is block diagonal.Then, the re-parameterized expressions for the cdf, pdf, expectation and variance of Y i are given by and The GWLM is defined by the pdf (4) for the random component, and by a function g applied to λ i such that g(λ i ) = η i = x T i β is the systematic component, where x i is the vector of covariates associated with the ith observation and β is an unknown vector of parameters.
By raising equation (3) to the power δ > 0, we obtain the EW distribution with scale parameter λ i and two shape parameters ϕ and δ.Henceforth, the re-parameterized EW distribution is denoted by Y ∼ EW(λ i , ϕ, δ).The cdf and pdf of Y are given by and respectively.The model ( 6) is referred to as the EW distribution.If δ = 1 and ϕ = 1, we obtain the Exponential distribution, whereas if ϕ = 1, we have the EE distribution.If Y i follows the EW distribution, then the random variable 2) has the EE distribution with unit scale parameter and shape parameter δ.The following theorem presents the result in a more general form.
Proof: Let t be a point in the support of the EW model.Then, The rth ordinary moment of the EW distribution (without restrictions) is given by (Choudhury, 2005) where The first two moments of the EW distribution are So, the expectation and variance of Y i are given by where The parameter λ i is a type of location parameter that also affects the variance function of the EW distribution in the new parametrization so, the variance is a quadratic function of the mean.Plots of the density function of Y for selected values of δ and λ i are displayed in Figure 2.

The New Regression Model
In practice there are many situations where the response variable is influenced by one or more explanatory variables or covariates, and can be related to treatments, intrinsic traits of sample units, exogenous variables, interactions or timedependent variables, among others.The random component of the GEWLM is defined by the pdf (6) for the response variable y i .We accommodate different covariates structures for model (6) using a known function g, which is twice differentiable and bijective, that links the parameter λ i to the predictor η i , called the link function, by ) is a set of p covariates associated with the ith response variable y i for i = 1, . . ., n and β T = (β 1 , . . ., β p ) is the vector of regression parameters to be estimated.The parameter λ i can be specified by the inverse of the link function, i.e.
, where it can be seen that part of the mean response for the ith variable is specified by λ i , besides influencing the model's variance, because the variance function of the EW distribution is a quadratic function of the mean Var( The behavior of c 1 (ϕ, δ) and c 2 (ϕ, δ) are shown in Figure 3 for the values of δ = 0.2, 1, 2 and 9. Figure 3 reveals that as the value of ϕ approaches zero, the expectation and variance of Y i increase, while as ϕ increases, both moments decrease.In particular, if δ = 1, when ϕ → ∞, c 1 (ϕ, 1) → 1 and c 2 (ϕ, 1) → 0.
If Y 1 , . . ., Y n is a sequence of n independent random variables, such that Y i ∼ EW(λ i , ϕ, δ) and g(λ i ) = η i = x T i β is the link function, then the equation that best describes the regression model is given by where ε i ∼ EW(1, ϕ, δ).The mean and variance are given, respectively, by The systematic component η i = x T i β enters in the regression model in the form of a linear structure for the explanatory variables and the connection between the systematic and random components is established by the inverse link function The choice of the link function depends strongly on the experiment and the relation between the response variable and the covariates.Some possible choices for the link function are, for example, the identity g(λ i ) = λ i , the reciprocal g(λ i ) = 1/λ i and the logarithmic g(λ i ) = log(λ i ) functions.In this way, the regression is composed of a linear structure in the parameters and a nonlinear function with multiplicative error.Clearly, the GWLM (Prudente & Cordeiro, 2010) is a special case of (7) when δ = 1.

Maximum Likelihood Estimation
We consider a random sample (y 1 , x 1 ), . . ., (y n , x n ) of n independent observations, where x T i = (x i1 , . . ., x ip ), for i = 1, . . ., n, is the vector of covariates associated with the ith individual and y i is the response variable following the EW(λ i , ϕ, δ) distribution with λ i = g −1 (x T i β).The log-likelihood function for the model parameters θ T = (β T , ϕ, δ) is given by where 2) .
The score vectors for β, ϕ and δ can be expressed as /∂η i are the derivatives of the inverse link function, 1 is an n × 1 vector of ones and the vectors s = (s(z 1 ), . . ., s(z n )) T , τ = (τ 1 , . . ., τ n ) T and κ = (κ 1 , . . ., κ n ) T have components given by ) and respectively.The MLEs of the regression coefficients and unknown parameters are the solutions of the nonlinear equations U β = 0, U ϕ = 0 and U δ = 0. We use iterative methods to determine the roots.The NLMixed procedure in SAS has been used for maximizing the log-likelihood function ℓ(θ).Initial values for β and ϕ are taken from the fit of the GWLM with δ = 1.
For interval estimation and hypothesis tests on the model parameters, we require the (p + 2) × (p + 2) observed ( L(θ)) and expected (I(θ)) information matrices given in the Appendix.Under general regularity conditions, we can construct approximate confidence intervals for the individual parameters based on the multivariate normal N p+2 (0, I( θ) −1 ) distribution, where θ is the MLE of θ.
We can compute the maximum values of the unrestricted and restricted log-likelihoods to construct likelihood ratio (LR) statistics for testing some sub-models of the GEWLM.For example, the test of H 0 : δ = 1 versus H 1 : H 0 is not true is equivalent to comparing the GEWLM and GWLM and the LR statistic reduces to where β, φ and δ are the MLEs under H 1 and β and φ are the estimates under H 0 .The test statistic w has approximately a Chi-square distribution with degree of freedom given by the difference between the numbers of parameters of the two models.

Simulation Study: Maximum Likelihood
Next, we perform a Monte Carlo simulation study to evaluate the MLEs of the GEWLM by considering the systematic component The values of the response variable are simulated from the inverse of the cdf of the EW distribution given by ( 5), such that Further, the values of the covariates x are generated by x i = exp(t i ), where t i ∼ N(0, 1), so that x has strong rightward symmetry.
Based on the values reported in Table 4.1, we note that, in all cases, as the sample size increases, the the MSEs and biases decrease and the estimates approach the true values.Further, the biases and MSEs are much smaller for the estimates of β 0 and β 1 using the logarithmic link function, and the convergence to the true parameter values is faster.

Bayesian Estimation
As an alternative analysis, we use the Bayesian method which allows for the incorporation of previous knowledge of the parameters through informative priori density functions.
By combining the likelihood function ( 8) and the prior distribution in (9), the joint posterior distribution for θ is obtained as π(θ|D) ∝ L(θ; D) ∏ p i=1 π(β i )π(δ)π(ϕ).This joint posterior density is analytically intractable and we have based our inference on the Markov chain Monte Carlo (MCMC) simulation methods.In particular, the Gibbs sampler algorithm (Gamerman & Lopes, 2006) has proven a powerful alternative.No closed-form is available for any of the full conditional distributions necessary for the implementation of the Gibbs sampler.Thus, we have resorted to the Metropolis-Hastings algorithm.We begin by making a change in the variables to ξ = (log(δ), (log(ϕ), β), so that the parameter space is transformed into R p+2 (necessary for the work with Gaussian densities).Regarding the Jacobian of this transformation, our joint posterior density (or target density) is given by To implement the Metropolis-Hastings algorithm, proceed as follows: (1) Start with any point ξ (0) and stage indicator j = 0; (2) Generate a point ξ ′ according to the transitional kernel , where Σ is the covariance matrix of ξ, which is the same in any stage; (3) Update ξ ( j) to ξ ( j+1) = ξ ′ with probability p j = min{1, π(ξ ′ |D)/π(ξ ( j) |D)}, or keep θ ( j) with probability 1 − p j ; (4) Repeat steps (2) and (3) by increasing the stage indicator until the process has reached a stationary distribution.
All computations are performed in R software (R Development Core Team [RDCT], 2011).

Model Comparison Criteria
A variety of methodologies can be applied for the comparison of several competing models for a given data set and selecting the best one to fit the data.One of the most used approaches is derived from the conditional predictive ordinate (CPO) statistic.For a detailed discussion on the CPO statistic and its applications to model selection, see (Gelfand, Deys & Chang, 1992).
Let D denote the full data and D (−i) denote the data with the deleted i-th observation.We denote the posterior density of θ given D (−i) by π(θ|D (−i) ), i = 1, . . ., n.For the i-th observation, CPO i can be written as The CPO i can be interpreted as the height of the marginal density of the time for an event at y i .Therefore, high CPO i implies a better fit of the model.No closed-form of the CPO i is available for the proposed model.However, a Monte Carlo estimate of CPO i can be obtained by using a single MCMC sample from the posterior distribution π(θ|D).Let θ (1) , . . ., θ (Q) be a sample of size Q of π(θ|D) after the burn-in.A Monte Carlo approximation of CPO i (Ibrahim, Chen & Sinha, 2001) is given by For model comparisons we use the log pseudo marginal likelihood (LPML) defined by LPML = n ∑ i=1 log( CPO i ).The higher the LPML value, the better the fit of the model.
Other criteria, such as the deviance information criterion (DIC) proposed by (Spiegelhalter, Best & van der Linde, 2002), the expected Akaike information criterion (EAIC)- (Brooks, 2002), and the expected Bayesian (or Schwarz) information criterion (EBIC)- (Carlin & Louis, 2001) can also be used.They are based on the posterior mean of the deviance, which can be approximated by d . The DIC criterion can be estimated using the MCMC output by DIC = d + ρ d = 2d − d, where ρ D is the effective number of parameters defined as E{d(θ)} − d{E(θ)}, where d{E(θ)} is the deviance evaluated at the posterior mean that can be estimated as Similarly, the EAIC and EBIC criteria can be estimated by means of where #(θ) is the number of model parameters.

Simulation Study: Bayesian Analysis
A simulation study conducted to evaluate the parameter estimates for the proposed model through the Bayesian analysis.We consider the systematic component η i = β 0 + β 1 x i , i = 1, . . ., n, and the logarithmic (g(λ i ) = log(λ i )) link function.
The values of the response variable are simulated from the inverse of the cdf of the EW distribution , such that Further, the values of the covariates x are generated by x i = exp(t i ), where t i ∼ N(0, 1), so that x has strong rightward symmetry.
Samples of size n = 150 and 300 were generated with logarithmic link functions and true values β 0 = 2, β 1 = 1.0, ϕ = 0.5, δ = 1.5.Therefore, each one with 500 Monte Carlo generated data sets.The following values σ 2 β 0 = σ 2 β 1 = σ 2 δ = σ 2 ϕ = 100 for the prior distributions given in (9) were considered.Note that we assume weak, but informative prior.Because our prior is still informative, the posterior is always proper.In all the work, we consider 40,000 sample burn-in, and use every tenth sample from the 200, 000 MCMC posterior samples to reduce the autocorrelations and yield better convergence results, thus obtaining an effective sample of size 20,000 upon which the posterior is based on.We monitor the convergence of the Metropolis-Hasting algorithm using the method proposed by (Geweke, 1992), as well as trace plots.
Further, we calculate the mean, biases, mean squared error (MSE) and 95% Coverage Probability (CP).The figures in Table 4.1 indicate that the bias and MSE decrease when the sample size increases.It is also observed that as the sample size increases the CPs become closer to the nominal value.

Sensibility Analysis
There are basically two approaches to detecting observations that seriously influence the results of a statistical analysis.
One approach is the influence local approach, and the second approach is Bayesian case influence diagnostics.

Local Influence
Influence diagnostic is an important step in the analysis of data, since it provides an indication of bad model fit or of influential observations.Since regression models are sensitive to the underlying model assumptions, generally performing a sensitivity analysis is strongly advisable.(Cook, 1986) used this idea to motivate the assessment of influence analysis.
He suggested that more confidence can be put in a model which is relatively stable under small modifications.Another approach suggested by (Cook, 1986) is to weight observations instead of removing them.The calculation of local influences can be carried out for model ( 7).If the likelihood displacement LD(ω) = 2{l( θ) − l( θω )} is used, where θω denotes the MLE under the perturbed model, then the normal curvature for θ at direction d, ∥d∥ = 1, is given by ] −1 ∆d|, where ∆ is a (p + 2) × n matrix that depends on the perturbation scheme.The elements of this matrix are given by ∆ vi = ∂ 2 l(θ|ω)/∂θ v ∂ω i , i = 1, 2, . . ., n and v = 1, 2, . . ., p + 2, evaluated at θ and ω 0 ; ω 0 is the no-perturbation vector.For the GEWLM, the elements of L(θ) can be obtained from the authors upon request.We can also calculate the normal curvatures C d (θ) to obtain various index plots, including, for instance, the index plot of d max , the eigenvector corresponding to C d max , the largest eigenvalue of the matrix B = −∆ T [ L(θ) ] −1 ∆, and the index plots of C d i (θ), which are together denoted as the total local influence.See, for example, (Lesaffre & Verbeke, 1998), where d i denotes an n × 1 vector of zeros with one at the ith position.Thus, the curvature at the direction d i takes the form Next, for three perturbation schemes, we calculate the following matrix Previous works on local influence curvatures in regression models for censored data are due to (Escobar & Meeker, 1992), (Ortega, Bolfarine & Paula, 2003), (Ortega, Cancho & Paula, 2009), (Ortega, Cordeiro & Hashimoto, 2011), (Silva, Ortega, Garibay & Barreto, 2008) and (Hashimoto, Ortega, Cancho, & Cordeiro, 2013).We consider model ( 7) and its log-likelihood function given by (8).We denote the vector of weights by ω = (ω 1 , . . ., ω n ) T .

• Case-weight Perturbation
In this case, the log-likelihood function reduces to where Here, ∆ is given by Now, we consider that each y i is perturbed as y iw = y i + ω i S y , where S y is a scale factor that may be estimated by the standard deviation of the observed response y and ω i ∈ ℜ.The perturbed log-likelihood function can be expressed as where z iω = (y iω / λi ) φe Γ ′ (2) and ω 0 = (0, . . ., 0) T .The matrix ∆ is given by

• Explanatory Variable Perturbation
Consider now an additive perturbation on a particular continuous explanatory variable, say x j , by setting x i jω = x i j + ω i S x j , where S x j is a scale factor and ω i ∈ ℜ.The perturbed log-likelihood function has the form where z iω = (y i / λiω ) φe Γ ′ (2) e λiω = g −1 (x T i β + S x j β j ω i ) and ω 0 = (0, . . ., 0) T .The matrix ∆ is given by where Note that the equations, λ ′ iω , s(z iω ), l iω , n iω , c iω and f iω are defined in Section 3, but applied to z iω and λ iω .
In order to determine whether the ith observation is possibly influential, (Poon & Poon, 1999) proposed classifying the ith observation as possible influential if M(0 tr(2 T ) is greater than M(0) + c * S M(0), where M(0) = 1/q, S M(0) is the sample standard error {M(0) k , k = 1, . . ., q} and c * is a any constant selected according to the real application.(Cook, 1986) suggested that more confidence should be put in a model relatively stable under small modifications.The best known perturbation schemes are based on case-deletion (Cook & Weisberg, 1982) whose effects are studied of completely removing cases from the analysis.This reasoning will be the basis for our Bayesian global influence methodology for the determining of the influential observation in the analysis.This reasoning will form the basis of our Bayesian global influence methodology and, in doing so, it will be possible to determine which subjects might influence the analysis.In this work, we use the Bayesian case-deletion influence diagnostic measures for the joint posterior distribution based on the ψ-divergence ( (Peng & Dey, 1995) and (Weiss, 1996)).

Bayesian Case Influence Diagnostics
Let D ψ (P, P (−i) ) denote the ψ-divergence between P and P (−i) , in which P denotes the posterior distribution of ϑ for the full data, and P (−i) denotes the posterior distribution of ϑ without the ith case.Specifically, where ψ is a convex function with ψ(1) = 0. Several choices concerning the ψ are given by (Dey & Birmiwal, 1994).For example, ψ(z) = − log(z) defines the Kullback-Leibler (K-L) divergence, ψ(z) = 0.5|z − 1| defines the L 1 norm (or variational distance) and ψ(z) = (z − 1) log(z) gives the J-distance (or the symmetric version of K-L divergence).
Note that D ψ (P, P (−i) ) can be interpreted as the ψ-divergence of the effect of deleting the i-th case from the full data on the joint posterior distribution of θ.As pointed out by (Peng &Dey, 1995) and(Weiss, 1996), it may be difficult for a practitioner to judge the cutoff point of the divergence measure so as to determine whether a small subset of observations is influential or not.Therefore, we will use the proposal by (Peng & Dey, 1995) and (Weiss, 1996) by considering a biased coin, which has success probability p.Then, the ψ-divergence between the biased and unbiased coins is ) where f 0 (x) = p x (1 − p) 1−x and f 1 (x) = 0.5, x = 0, 1.If D ψ ( f 0 , f 1 ) = d ψ (p), it can be easily checked that d ψ satisfies the following equation It is not difficult to see for the divergence measures considered that d ψ increases as p moves away from 0.5.In addition, d ψ (p) is symmetric at p = 0.5 and d ψ , achieves its minimum at p = 0.5.At this point, d ψ (0.5) = 0, and f 0 = f 1 .Therefore, if we consider p > 0.90 (or p ≤ 0.10) as a strong bias in a coin, then d K-L (0.90) = 0.51, d J (0.90) = 0.88 and d L 1 (0.90) = 0.4.Thus, if we use the J-distance, an observation which d J > 0.88 can be considered influential.Similarly, using the Kullback-Leibler divergence and the L 1 norm, we can consider an influential observation when d K-L > 0.51 and d L 1 > 0.4, respectively.

Influence of Outlying Observations
One of our main goals is to show the need for robust models to deal with the presence of outliers in the data.We consider simulated datasets with one, two and three generated perturbed cases to examine the performance of the proposed diagnostics measures.A sample of size 300 is generated by GEWLM considering the logarithmic link function and true values β 0 = 1, β 1 = 0.5, ϕ = 2, δ = 1.5.In the simulated data, y i ranged from 0.4535 to 744.8722 with median = 4.1141, mean = 13.4078 and standard deviation = 56.6212.We selected the cases 25, 150 and 225 for perturbation.To create an influential observation in the dataset, we choose one to three select the cases and perturbed the response variable as follows:  (Geweke, 1992).
Table 5.3 shows the posterior inferences about the parameters that, except for the parameter β 1 , are sensitive to the perturbation of the selected case(s).
Table 5.3 shows the Monte Carlo estimates of the DIC, EAIC, EBIC and LPML criteria for each perturbed version of the original data set.We can observe that, as expected, the original simulated data (Setup A) has the best fit.For each simulated data set, we now consider the sample from the posterior distributions of the parameters of the EW model to calculate the three ψ-divergence measures (d KL , d J , d L 1 ) described in Section 5.2.The results in Table 5.3 show, before perturbation (Setup A), the selected cases are not influential according to all ψ-divergence measures.However, after perturbation (Setup, B-H), the measures increase, which indicates that the perturbed cases are influential.Thus, we clearly see that all ψ-divergence measures performe well to identifying influential case(s),

Residual Analysis
When attempting to adjust a model to a dataset, the validation of this fit must be analyzed by a specific statistic with the purpose, with the purpose of measuring the goodness-of-fit.Once the model is chosen and fitted, the analysis of the residuals is an efficient way to check the model's adequacy.The residuals also serve for other purposes, such as to detect the presence of aberrant points (outliers), identify the relevance of an additional factor omitted from the model and verify if there are indications of serious deviance from the distribution assumed for the random error.Further, since the residuals are used to identify discrepancies between the fitted model and the dataset, it is convenient to try to define residuals that take into account the contribution of each observation to the goodness-of-fit measure used.
In summary, the residuals allow measuring the model's fit for each observation and enable studying whether the differences between the observed and fitted values are due to chance or to a systematic behavior that can be modeled.The methods proposed by (Dunn & Smyth, 1996) can be used to obtain quantile residuals.
The quantile residual for the GEWLM is given by where Φ(•) is the cumulative standard normal distribution.(Atkinson, 1985) suggested the construction of envelopes to enable better interpretation of the normal plot of probabilities of the residuals.These envelopes are simulated confidence bands that contain the residuals, such that if the model is well fitted, the majority of points will be within these bands and randomly distributed.The construction of the confidence bands follows these steps • Fit the proposed model and calculate the residuals t i 's; • Simulate k samples of the response variable using the fitted model; • Fit the model to each sample and calculate the residuals t i j , j = 1, 2, . . ., k and i = 1, 2, . . ., n; • Arrange each group of n residuals in rising order to obtain t (i) j for j = 1, 2, . . ., k and i = 1, 2, . . ., n; • For each i, obtain the mean, minimum and maximum t (i) j , namely • Include the means, minimum and maximum together with the values of t i against the expected percentiles of the standard normal distribution.
The minimum and maximum values of t i form the envelope.If the model under study is correct, the observed values should be inside the bands and distributed randomly.

Application: Insurance Payouts for Personal Accidents
The dataset comes from a study of the size of the indemnities paid for personal accidents by Australian insurers, in the period from July 1989 to June 1999, published by (Jong & Heller, 2008).The aim is to relate the variables legal representation (1 if the insured is represented by a lawyer and 0 otherwise) and claim settlement time, representing the percentage of cases whose settlement time was faster than the average for all claims of the same type, with the payouts denominated in Australian dollars.
The indemnity amounts paid by insurers are typically concentrated in a low range, with many fewer high payouts, so that the distribution has rightward asymmetry of the data.There are various distributions with rightward asymmetry and positive support that can be used to model these data.Therefore, application of the hrf is an important tool to model these data, because often distributions with totally different failure function behavior have similar densities.
For this analysis, we consider the last 18 months of the study (January 1998 to June 1999) and insureds with legal representation, giving a total of 542 observations, with settlement time as the covariate.Initially, we conduct an exploratory analysis of the data, whose results are presented in Table 7.The histogram of this dataset is displayed in Figure 7a.The results in Table 7 and Figure 7 show symmetry to the right of the data, indicating adjustment by an asymmetric distribution is most appropriate.
In many applications there is qualitative information about the hazard shape, which can help with selecting a particular GEWLM.In this context, a device called the total time on test (TTT) plot (Aarset, 1987) is useful.The TTT plot is obtained by plotting ∑ n i=1 T i:n ), where r = 1, . . ., n and T i:n for i = 1, . . ., n are the order statistics of the sample, against r/n.It is a straight diagonal for constant hazards leading to an exponential model.It is convex for decreasing hazards and concave for increasing hazards leading to a single-Weibull model.It is first convex and then concave if the hazard is bathtub-shaped leading to a GEWLM.It is first concave and then convex if the hazard is bimodal-shaped leading to a GEWLM.For multimodal hazards, the TTT plot contains several concave and convex regions.
The TTT-plot for this dataset is displayed in Figure 7b, where it can be seen that the curve is initially concave and then becomes convex around the reference diagonal line, indicating that the general shape of the hrf is unimodal, so the GEWLM is appropriate for the data.
It is interesting to note that if the TTT-plot curve had initially been convex and then turned concave, the hrf would have been U-shaped, so the Weibull distribution would not be suitable to fit the data, since it does not model data with bathtub hrf, unlike the EW distribution.

Maximum Estimation
The regression model, considering the GEWLM, is given by where y i is the amount paid from the ith observation, x i is the settlement time associated with the ith observation, ϵ i is the random error having the EW(1, ϕ, δ) distribution and g −1 (•) is the inverse link function.
In Figure 7, we provide the dispersion plots of the settlement time covariate: (a) versus the response variable (amount of the indemnity); and (b) versus the natural logarithm of the response variable.It can be seen that the logarithm of the indemnity amount grows linearly with settlement time and the variability is stabilized.Therefore, we adopt the logarithmic link function log(λ i ) = η i to fit the GEWLM, and the regression model is given by Table 7 gives the Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC) and Bayesian Information Criterion (BIC), the estimates of the parameters, the standard errors (SEs) and estimated levels (p-values) for the GEWLM and GWLM regression models.It can be noted that the GEWLM again gives the lower values for the AIC, CAIC and BIC statistics than the GWLM.The regression parameters are significant for both models, so they should remain in the study.Next, we calculate the odds ratio statistic to test the hypotheses H 0 : δ = 1 versus H 1 : δ 1 to compare the GEWLM and GWLM, namely w = 2[−5365.0− (−5413.0)]= 96 (p-value = < 0.0001).Then, according to the hypothesis test, H 0 should not be accepted.Therefore, the best model to fit the data is the GEWLM.Further, we perform an analysis of local influence for the current data using the GEWLM.The plots in Figure 7 indicate the possible influential observations in the GEWLM regression.The plots of local influence (M(0) against the index of the observations) considered three perturbation schemes: likelihood (Figure 7a), response variable (Figure 7b) and the covariable settlement time (Figure 7c).The observations 28, 369 and 539 appear in the three perturbation schemes as possible influential points, so we remove them from the dataset, individually and together, and re-estimated the model parameters to study the influence of these observations on the estimates of the parameters.The observation 28 is the lowest indemnity payouts for personal accident damages by the Australian insurers within the sample studied (y 28 = 109), with settlement time x 28 = 1.1 and the observations 369 and 539 are the highest (y 369 = 76255.76,x 369 = 14.8) and (y 539 = 116586.72,x 539 = 31.4).

Bayesian Analysis
We have checked the sensitivity analysis for the variance component parameters for various choices of prior parameters.Here, we consider four priors in the study.Prior 1: , 000.The posterior summaries of the parameters do not present remarkable difference and not impair the results in Table 7.
In an overall sense, confronting the GEWLM and GWLM, since in EAIC, EBIC and LPML criteria to compare these models.The GEWLM stands out as the best one.
We compute the ψ-divergence measures in (12) described in Section 5.2.Table 7 gives the results for the four observations that have higher values.Cases 28 and 359 are possible influential observations in the posterior distribution.The observation 28 is the lowest indemnity payouts for personal accident damages by the Australian insurers within the sample studied (y 28 = 109), with settlement time x 28 = 1.1 and the observation 539 is the highest (y 539 = 116586.72,x 539 = 31.4).
From now on, we assume Prior 2. Figures 7, 7, 7 and 7 reveal the approximate marginal posterior densities of the model parameters and trace plot considering the 10,000 observations generated, and Figure 7 shows the index plot of the three ψ-divergence measures.
Figure 7 shows the index plots of ψ-divergence measures for the insurance payouts for personal accidents data when we dropped observations 28 and 359.For all divergence measures, we can see that after dropped the two observations have   10,742.440 10,759.620 10,370.800 GWLM 10,835.162 10,848.048 10,418.818 Prior 2 GEWLM 10,742.486 10,759.667 10,370.903 GWLM 10,835.130 10,848.016 10,418.637 Prior 3 GEWLM 10,742.531 10,759.712 10,371.059 GWLM 10,835.110 10,847.990 10,418.720 Prior 4 GEWLM 10,742.539 10,759.721 10,370.985 GWLM 10,835.137 10,848.023 10,418.749Table 11.ψ-divergence measures for the real data fitting the GEWLM  Table 7 gives the relative change (RCs) of each estimate of the parameters after removal of the observations 28, 539, and (28 and 539), and the 95% credible intervals for the parameters.The RC (in percentage) of each estimated parameter is defined by RC θ j = |( θ j − θ j(I) )/ θ j | × 100%, where θ j(I) denotes the posterior mean of θ j , with j = 1, . . ., 4, after set I of observations has been removed.It can be seen a small RC for the parameter β 0 and that the significance of the regression parameters β 0 and β 1 did not change, indicating the model's robustness.Furthermore, a better fit was achieved when we dropped observation 28, in comparison with the fitting when we removed the observation 539 according to, according to all criteria.
Thus, Table 7 and Table 7 reveal the parameter estimates, standard errors and significance of the parameters for the MLEs and Bayesian estimates, respectively.By examining the figures in this table, we conclude that the estimates by the two methods are very similar.We show that the GEWLM seems to be more appropriate for fitting the data set than the GWLM.
We can also see in the diagnostics analysis results similar in both approaches.

Figure 1 .
Figure 1.Some densities of the EW distribution

Figure 2 .
Figure 2. Plots of the functions c 1 (ϕ, δ) and c 2 (ϕ, δ) 150 and 225, where S y is the standard deviations of the y i 's.Here, we consider eight setups in the study.Setup A: original dataset, without outliers; Setup B: data with outlier 25; Setup C: data with outlier 150; Setup D: data with outliers 225; Setup E: data with outliers 25 and 150; Setup F: data with outlier 25 and 225; Setup G: data with outliers 150 and 225; and Setup H: data with outliers 25, 150 and 225.The MCMC computations are made similar to those in the last subsection and further to monitor the convergence of the Gibbs samples we use the Geweke's convergence diagnostic proposed by

Figures 5
Figures 5.3, 5.3, 5.3 and 5.3  show the three ψ-divergence measures for datasets A, B, F and H, respectively.All measures identified influential case(s) and provided larger ψ-divergence measures in comparison to the other cases.

Figure 8 .
Figure 8. Dispersion plots of the covariate settlement time (x i ).(a) Versus the response variable indemnity amount (y i ).(b) Versus log(y i ).

Table 2 .
Mean, bias, MSE and 95% CP for the parameter estimates for fitting the GEWLM model in each dataset

Table 3 .
Mean and standard deviation (SD) for the parameter estimates for fitting the GEWLM in each dataset

Table 5 .
ψ-divergence measures for the simulated data fitting the WE model

Table 6 .
Descriptive statistics of the dataset of personal accident insurance payouts

Table 7 .
MLE of the parameters for the GWLM and GEWLM regression models

Table 7
the estimate of δ is different from 1, 1 (4.1627, 10.3994), we have indications favoring the GEWLM model.Also, we present in Table7the values of the DIC,

Table 9 .
Real data.Posterior summaries of the parameters for the GEWLM and GWLM

Table 10 .
Real data.Bayesian criteria