A Bayesian Mixture Model Accounting for Zeros and Negatives in the Loss Triangle

In insurance loss reserving, a large portion of zeros are expected at the later development periods of an incremental loss triangle. Negative losses occur frequently in the incremental loss triangle due to actuarial practices such as subrogation and salvation. The nature of the distributions assumed by most stochastic models, such as the lognormal and over-dispersed Poisson distributions, brings restrictions on the zeros and negatives appearing in the loss triangle. In this paper, we propose a Bayesian mixture model for stochastic reserving under the situation where there are both zeros and negatives in the incremental loss triangle. A multinomial regression model will be applied to model the sign of the loss data, while the lognormal distribution is assumed for the loss magnitudes of negatives and positives. Bayesian generalized linear models will be fitted for both the mixture and magnitude models. The model will be implemented using the Markov chain Monte Carlo (MCMC) techniques in BUGS. Our model provides a realistic tool for stochastic reserving in the cases of zeros and negatives.


Introduction
Determining an appropriate amount of loss reserve is very important for the financial stability of an insurance company. Although the traditional methods such as the chain ladder and Bornhuetter-Ferguson methods are simple to implement, they do not thoroughly address the stochastic nature of the data. Recent researchers focus more on the stochastic loss reserving methods, in which the variability and tail values of the distribution of the reserve are studied.
For stochastic reserving (England and Verrall, 2002), specific distributions such as the lognormal (Kremer, 1982), over-dispersed Poisson (Renshaw and Verrall, 1998;England, Verrall and Wuthrich, 2012), negative binomial (Verrall, 2000) and gamma (de Alba and Nieto-Barajas, 2008) are assumed for the loss reserving data. For these models, classical generalized linear model (Nelder and Wedderburn, 1972) structures can be fitted to the mean or other parameters of the reserve distribution. The application of the generalized linear structures gives rise to the stochastic models reproducing the chain ladder reserves. Comparisons of the chain ladder model and the stochastic models reproducing the chain ladder reserves can be found in papers such as Kremer (1982), Verrall (2000) and Mack and Venter (2000).
With its capability of incorporating external information (Verrall and England, 2005), Bayesian inference is used frequently in stochastic reserving. In papers such as Scollnik (2002) and de Alba (2002Alba ( , 2006, external information is incorporated into the stochastic reserving model by specifying prior distributions for the parameters. Bayesian models for the chain ladder and Bornhuetter-Ferguson methods were introduced in Scollnik (2004), Verrall (2004) and England et al. (2012) respectively. Antonio and Plat (2013) proposed a Bayesian stochastic reserving model under the individual claim level. Most of the above models can be implemented using the Markov Chain Monte Carlo (MCMC) simulation method in the Bayesian software package BUGS (Spiegelhalter et al., 2004). Reviews of the MCMC method, BUGS, and their application in actuarial science can be found in Scollnik (2001). See Makov et al. (1996) for a more general discussion of Bayesian methods in actuarial science.
In real data, a large portion of zeros are expected at the later development periods of an incremental loss triangle. Negative losses occur frequently in the incremental loss triangle due to actuarial practices such as subrogation, salvation, cancellation of a claim, initial over-estimation of the case reserve, consequences of judicial decisions, and errors. A large number of zeros and negatives occur in the loss triangle, which may make some of the models inappropriate or even undefined.
To cope with the problems caused by zero and negative values in stochastic loss reserving, some improved models have been proposed in the recent literature. An improved Bayesian lognormal model was introduced by de Alba (2002,2006) to extend the lognormal model (Kremer, 1982) to situations where there are negative values in the loss triangle. Kunkler (2004Kunkler ( , 2006 proposed a Bayesian binomial model to handle the situation where there are either zeros or negatives in the loss triangle. None of these models, however, handle the more realistic situation when there are notable numbers of both zeros and negatives in the loss triangle. In this paper, we will propose a Bayesian mixture model for handling the situation when there are both zeros and negatives in the loss reserving data. A multinomial regression model will be applied to model the sign of the loss data, while the lognormal distribution is assumed for the loss magnitudes of negatives and positives. Bayesian generalized linear models will be constructed for both the mixture and magnitude models. A chain ladder type model structure derived from the structure in Zehnwirth (1994) is used for the magnitude of the positive losses. The model will be implemented using the Markov chain Monte Carlo (MCMC) techniques in BUGS. We will perform a case study using the adjusted loss triangle from the 'Historical Loss Development Study' (1991) by the Reinsurance Association of America.
The paper is organized as follows. Section 2 presents the mixture model in the context of Bayesian inference. In Section 3, the model will be implemented in BUGS with a well-studied loss triangle data. Section 4 concludes the paper.

A Bayesian Mixture Model
The earlier models such as de Alba (2002Alba ( , 2006 and Kunkler (2004Kunkler ( , 2006 are all aimed at solving the problem of loss reserving when either zeros or negatives appear in the loss triangle, but not both together. No model has been proposed for extending the stochastic reserving models to situations where there are both zeros and negatives existing in the loss triangle. A Bayesian mixture model will be proposed in this section to extend the conventional stochastic loss reserving model to a more general situation where there are a considerable number of both zero and negative values in the loss triangle.

Distribution of Mixture Data
Here, we will use notation consistent with that in Kunkler (2004Kunkler ( , 2006. Let y i j denote the incremental losses in the loss triangle. Based on the sign of the data, the incremental loss triangle can be split into three subsets containing values of negatives, zeros, and positives, respectively. The three subsets are defined as S (−) = {y i j : y i j < 0}, S (0) = {y i j : y i j = 0} and S (+) = {y i j : y i j > 0}.
Denoting n a as the number of accident periods included in the data, a mixture data triangle z = {z i j : i = 1, 2, . . . , n a ; j = 1, 2, . . . , n a − i + 1} can be defined for modelling the sign of the incremental loss triangle where The sum of the mixture data for each development year stands for the number of y i j observations that are negative, zero, or positive from our definition. This can be written as Denote P(y i j < 0) = λ 1 j , P(y i j = 0) = λ 2 j , then P(y i j > 0) = 1 − λ 1 j − λ 2 j as the mixture probabilities. Assuming conditional independence for losses from different accident years, the sum of the mixture data in each development year also follows a multinomial distribution with its probability distribution function given by

Generalized Linear Models
In the actual loss triangle, there tends to be more zeros and negatives in the later stage of development years. So we can assume that the proportion of zeros and negatives depends only on the development year as in Kunkler (2004Kunkler ( , 2006. A Bayesian GLM for the multinomial probabilities on the development year j can be applied to model this structure. Two commonly used link functions for the multinomial distribution are the logistic and probit links (Dobson, 2002, Chapter 8, Page 135-148). Without loss of generality, only the logistic link is used for our analysis (i.e., a cumulative logistic regression model is assumed).
With a different factor δ ld introduced for each development year in the cumulative logistic model, the piecewise linear relationship (Kunkler, 2004(Kunkler, , 2006 gives a flexible structure. This model structure can be written as With the above structure, some of the parameters can be set to zero or assigned equal values based on the size of the dataset available. As loss triangles are usually summarized by accident and development years, the datasets used for loss reserving are usually very small. For relatively small datasets, a simple linear regression on j − 1 can be used as a special case of the above model. That is, (1)

Modelling Magnitude Data
To ease the analysis for this section, simplified notations for the distributions of the magnitudes of the positive and negative data can be introduced as where θ 1 = (β 1 , σ 2 1 ) and θ 2 = (β 2 , σ 2 2 ). As discussed in Section 1, many distributions such as the lognormal (Kremer, 1982;de Alba, 2002de Alba, , 2006Kunkler, 2004Kunkler, , 2006, over-dispersed Poisson (Renshaw and Verrall, 1998;England et al., 2012), negative binomial (Verrall, 2000) and gamma (de Alba and Nieto-Barajas, 2008) can be assumed for the loss magnitude data. For demonstration purposes, lognormal sampling distributions are assumed for the loss magnitude data y − and y + in our analysis. That is, where I 1 and I 2 are identity matrices of n y + × n y + , , β 1 is a k β 1 × 1 parameter vector, β 2 is a k β 2 × 1 parameter vector, X 1 is a design matrix of dimension n y − × k β 1 , and X 2 is a design matrix of dimension n y + × k β 2 .  Zehnwirth (1994) put forward a flexible regression structure which can be applied to the loss magnitude data of y − and y + . The model can be written as The α li parameters are for modelling the effect of accident year, while the γ ld and η lt parameters are chosen to catch the effects of development year and calendar year respectively. Note that the observed data in the loss triangle provide no information concerning the η lt parameters for future calendar years. Hence it is impossible to predict for future losses without making adjustments to the model and/or including prior information.
Setting zeros for all the η lt parameters gives a structure comparable to that of the chain ladder model. The model under this structure reduces to where the transformed e γ ld parameters are analogous to the development ratios in the chain ladder model. Kunkler (2004Kunkler ( , 2006 also introduced a simplified version of this model for relatively small datasets, in which There is an obvious limitation for this structure in that the trend may not be linear in either the development year or calendar year for real data. When negatives appear in the loss triangle and the size of the data set is relatively small, some smoothing structures (see e.g., Zehnwirth, 1985) can be introduced in order to avoid possible over parameterization. One of the choices may be the Hoerl curve (Zehnwirth, 1985) given by which provides a development pattern similar to those of the claim triangles.
A special case of the model in Equation (4) is when b i = b and r i = r for all i, assuming a common runoff pattern for all accident years. In this case, the model can be written as

Overall Model Structure
In the framework of Bayesian inference, the outstanding liabilities will be estimated based on the posterior predictive distributions of the magnitude and mixture data for future incremental losses. In the analysis of this subsection, the future incremental data triangle and future mixture data triangle will be denoted as where n d denotes the number of development periods in the data.
By averaging over z i j , the marginal sampling distribution of y i j gives us a form with which we can focus on the claim amount of our interest. Similar to Kunkler (2004Kunkler ( , 2006, we obtain In the above formula, we can verify that p(y i j |z i j = 0) = I (y i j =0) .
To predict future losses, the joint posterior predictive distribution ofỹ i j andz i j needs to be used with its formula given by Treating the future mixture trianglez as nuisance parameters, the marginal posterior predictive distribution of the future incremental triangleỹ can be obtained as

Posterior Analysis for Mixture Parameters
Under the GLM structure given in Subsection 2.1, the posterior distribution can be obtained for the cumulative logistic regression parameters ∆. That is where π(∆) is the prior distribution for ∆, and p(z j |∆) is the multinomial sampling mixture distribution for z j given by p(z j |∆) ∝ λ 1 (∆) z 1 j λ 2 (∆) z 2 j (1 − λ 1 (∆) − λ 2 (∆)) n a j −z 1 j −z 2 j .
With the logistic link function, the λ l parameters can be expressed in terms of ∆ as This part of the model can be implemented using MCMC algorithms such as the Metropolis-Hastings algorithm.
With the above analytical forms of the posterior distributions and posterior predictive distributions for all the unobserved variables (e.g., the regression coefficients and variance parameters), we will be able to implement our multinomial mixture model, using MCMC simulation methods such as the Metropolis-Hasting's algorithm. An alternative approach is to use a Bayesian software package such as BUGS that implements these algorithms after we specify the forms of the prior and sampling distributions. Due to its simple implementation, we will use BUGS to implement our model in the next section.

Model Implementation
The use of the specialized Bayesian software BUGS makes it relatively easy to implement our multinomial mixture model. Bayesian models including GLMs can be implemented in BUGS by specifying the sampling distributions, prior distributions and the regression functions. Hence, the posterior analysis given in Section 2 for our multinomial mixture model will not be needed for our model fitting in BUGS.
In this section, our multinomial mixture model will be fitted to the loss triangle adjusted from the original data at 'Historical Loss Development Study ' (1991). Particularly for the positive magnitude where we have plenty of data, a GLM structure different from that in Kunkler (2004Kunkler ( , 2006 will be constructed. The model is based on the three parameter lognormal model, with the interpretation of parameters more comparable to those from the chain ladder method. A calendar year trend parameter is introduced into this chain ladder type of structure. The variances of the positive and negative magnitudes will be modelled using the same method as in the ols g() function in MatLab (LeSage, 1999, page 176). The parameters for the mixture and magnitude models as well as the reserves will be estimated and compared to those from Kunkler (2006).

The Data
Since claim data from insurance companies are mostly confidential, it is generally hard to obtain a real dataset for a specific line of business that may contain zeros and negatives in the loss triangle. The open source datasets, on the other hand, are usually aggregate data from multiple insurance companies or multiple lines of business, that will generally not contain zeros or negatives due to the aggregation. For illustrative purposes, the original loss triangle from the 'Historical Loss Development Study' (1991) by the Reinsurance Association of America listed in Table  1 is adjusted so that it contains both values of zeros and negatives. The negative losses in our adjusted loss triangle are the same as those used by Kunkler (2006).

Modelling Mixture Data
In Section 2, we proposed using the multinomial distribution to model the mixture data, i.e. z i j ∼ multinomial(λ j , 1) .
From the mixture data we can see that there tends to be more zeros and negatives during the later development years. Since we have the same negatives as those in Kunkler (2006), for the negative probability λ 1 j we will use the same logit model structure given in Kunkler (2006). The model for the negative probability is given by Since the zero values are only observed after the 6th development period, we assume the probability for zeros λ 2 j differs only after the 6th development period. So we may use the following model structure for zeros: Similar to Kunkler (2006), we assume the same diffuse priors for the δ parameters. That is,

Modelling Magnitude Data
We assume the magnitudes of both positives and negatives follow lognormal distributions with different means and variances. That is, Here, we use notations slightly different to those in the previous section in order to allow for a more flexible variance structure, as well revealing more detailed information on the expected loss amount for a specific combination of accident and development years.

Positive Magnitude
For the magnitude data of positives, we will model the mean of the lognormal distribution with the chain ladder type structure in Equation (3) with the same calendar trend factor ι from Kunkler (2006). The model structure is given by Diffuse priors of N(0, 1000) are assumed for all of the α, γ and ι parameters in the above model. Since the loss triangle contains zeros, we are not able to use a common lognormal distribution for the loss triangle. The magnitudes of the positives and negatives have to be modelled using two lognormal distributions. Two vectors are used to store the values of i and j for the magnitude data.

Negative Magnitude
For the magnitude data of the negatives, a simplified model structure needs to be used to for relatively small datasets. Since we assume the same negative losses as in Kunkler (2006), the model in Kunkler (2006) can be used to model the mean of negative magnitude. The model structure for the negative magnitude is given by www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 2;2015 Similar to the magnitude model for the positives, diffuse priors of N(0, 1000) are assumed for all of the α, γ and ι parameters.

Modelling Variance
For the variance parameters σ 2 i j in Equations (6) and (7), we will use a model specification similar to that in the ols g() function in LeSage (1999, page 176). In our simulation, the prior density specification of σ 2 i j is defined in this way: where we used the fixed value of r = 100 so as to be consistent with Kunkler (2006). For the prior distribution of sige, we used a diffuse uniform prior U(0, L), with L = 100.
We estimate the parameters ω + and ω − in Equations (7) and (8) based on the same analysis given in Kunkler (2006, pages 553-554). The estimates of ω + and ω − are obtained by running the model using the variance construction in Equation (11). The estimates are given by ω + = 1/var(r + ) and ω − = 1/var(r − ), where r + and r − are the residual vectors for the positive and negative log magnitudes. That is, where log(y + ) and log(y − ) are the predicted values of log(y + ) and log(y − ) in the upper loss triangle for the existing data. The values of log(y + ) and log(y − ) are calculated from Equations (9) and (10) using the posterior estimates of the α, γ and ι parameters.
For the variance calculation, we use the n normalized sample variance. That is, for the data x = (x 1 , x 2 , . . . , x n ) we let wherex is the sample mean given byx = 1 n ∑ n i=1 x i . Using the data, we obtained the estimated values of ω + and ω − asω + = 6.3880 andω − = 5.1547. These values are used directly for our model implementation in BUGS.

Convergence of MCMC Simulation
Three chains with dispersed initial values are used for our simulation. With the uniform distribution U(0, 100) assumed for the parameter sige, the simulation converges very quickly, i.e. before iteration 1,000. This can be observed from the history plots of the 3 chains for each parameter or quantity of interest. The three chains mix very quickly.
The refined potential scale reduction factorsR c (Brooks and Gelman, 1998) are also monitored in order to diagnose convergence. In the case of all the parameters and quantities of interest, the corresponding refined potential scale reduction factorR c converged to approximately 1 within about 1,000 iterations. The values of the refined potential scale reduction factorsR c are less than 1.01 for all the quantities after 2,000 iterations.
We will use the simulated values from iterations 2,001 to 32,000 from all three chains for our posterior analysis in the following. A total of 30,000 posterior samples will be used for our posterior analysis of parameters and reserves.

Mixture Model
For the multinomial mixture model in Subsection 3.2.1, the parameters for the logit models in Equations (5) and (6) are estimated by posterior simulation. The estimates of the parameters, their standard deviations and percentiles are listed in Table 2. www.ccsenet.org/ijsp International Journal of Statistics and Probability Vol. 4, No. 2;2015  From Table 2 we observe that the estimates of the δ 1i parameters from the multinomial model have very close absolute values with those of the δ i parameters from the binomial model in Kunkler (2006) except for their signs. The closeness of the absolute values are due to the fact that we have the same negative values as those in Kunkler (2006). The difference in the sign is from the factor that the logit function we defined for the negatives is based on the ratio of negative probability over the positive, while the one in Kunkler (2006) is defined based on the opposite ratio.
The posterior mean for each future development year of each accident year in the lower part of the loss triangle is listed in Table 3 and 4 respectively for negatives and zeros. From the posterior mean of the negative probability we observe that the probability stays the same for the first 5 development years, and increases with development years from then on. Similarly, we observe that in the first 6 development years, the probability of zero is very small, and the probability increases significantly with every development year thereafter.  From Table 3 and 4, We observe that the probabilities of negatives and zeros increase with development years, which is consistent to what we observe from the data in Table 1. It is also consistent to the real practice that subrogations and salvations occur later in the development due to things such as time required for processing and investigation, and there will be no or little payment after many claims have been closed.

Positive Magnitude
For the lognormal positive magnitude model specified in Equations (9) and (10), we obtained the posterior predictive estimates of the α + i , γ + d and ι parameters. The estimates of the parameters, their standard deviations and percentiles are listed in Table 5. We observe from Table 5 that the both accident year effects (i.e., α + parameters) decreases with the accident year, and the development year effects (i.e., γ + parameters) also decreases as with the payoff of more claims there expect to be lower payments in the later years. The estimates and MC error are comparable to those from Kunkler (2004Kunkler ( , 2006 as we are using very similar data and model structures, despite the difference in dealing with zeros and negatives all at once.

Negative Magnitude
Similar to the previous subsection, the posterior predictive means for the α − and γ − i parameters in Equation (6) and (10) can be estimated using the simulated samples from BUGS. The estimates of the parameters, their standard deviations and percentiles are listed in Table 6. We observe that the estimates for the parameters are very close to those from Kunkler (2006), as we use the same model and same negative data for our simulation.  Kunkler (2006), as we have the same negatives values in the loss triangle and we are using the same model structure. Since the negative values we added are arbitrarily chosen, the estimates would not possess a meaningful interpretation for real practice. We also estimated the common parameters of the positive and negative magnitudes models, i.e. the calendar trend factor ι and the τ parameters for the inverse variances (i.e., precisions). The posterior mean, standard deviation and percentiles of the calendar parameters are listed in Table 7. Note that the parameter for the cumulative logistic regression model does not have a meaningful interpretation. The posterior means of the precision parameters are listed in Table 8.  The posterior mean reserve for each accident year and development year in the lower part of the loss triangle is estimated. The triangle reserve estimates are listed in Table 9. The estimated reserve amounts from our model are lower than those from Kunkler (2004Kunkler ( , 2006, as we have added both zeros and negatives in our loss triangle. The posterior means, standard deviations and percentiles for the total reserve (i.e. over all accident years) and reserves by accident year are listed in Table 10.