Monte Carlo Methods for Insurance Risk Computation

In this paper we consider the problem of computing tail probabilities of the distribution of a random sum of positive random variables. We assume that the individual variables follow a reproducible natural exponential family (NEF) distribution, and that the random number has a NEF counting distribution with a cubic variance function. This specific modelling is supported by data of the aggregated claim distribution of an insurance company. Large tail probabilities are important as they reflect the risk of large losses, however, analytic or numerical expressions are not available. We propose several simulation algorithms which are based on an asymptotic analysis of the distribution of the counting variable and on the reproducibility property of the claim distribution. The aggregated sum is simulated efficiently by importancesampling using an exponential cahnge of measure. We conclude by numerical experiments of these algorithms.


Introduction
Let Y 1 , Y 2 , . . . be i.i.d. positive random variables representing the individual claims at an insurance company, and let N ∈ N 0 designate the total number of claims occurring during a certain time period. The total amount of these claims is called the aggregated claim variable, denoteb by A major issue for insurance companies is the uncertainty of the occurrence of a large aggregated claim, because, if this happens, the company faces large losses that may ultimately lead to a ruin. Thus, an important quantity to compute is the insurance risk factor for large levels x. Because of its importance for insurance companies, many actuarial studies deal with this problem, see the monograph of Kaas et al. (2008). However, there are many other practical situations in which the object of interest is a random sum of i.i.d. random variables (Bahnemann, 2015). For instance, S N might represent the total loss of a financial institute due to defaults of N obligators with credit sizes Y 1 , Y 2 , . . ..
For doing the actual computions of the risk factor, one needs to fit a model for the counting distribution of N and the claim size distribution of Y . Nowadays we see that the Poisson and the Gamma distributions, respectively, are often being used (Bowers et al., 1997). Other proposals include negative binomial for the counting number and inverse Gaussian for the claim size.
However, due to large uncertainties, many realistic data show large overdispersion. In fact, our study is motivated by available data of a car insurance company for which the traditional distributions clearly do not fit properly. The (empirical) variance of the counting number data shows a power law with respect to the (empirical) mean, with a power close to three. This observation was the reason that we decided to consider counting distributions with tails that go beyond (are heavier than) the Poisson and negative binomial. A natural modeling technique to introduce families of distributions is by considering the concept of natural exponential families Smyth, 2005, 2008;Letac and Mora, 1990;Smyth and Jorgensen, 2002). In our case we are interested in natural exponential families with cubic variance functions (Letac and Mora, 1990). Concerning the counting variable N , we shall investigate • the Abel distribution; • the strict arcsine distribution; • the Takacs distribution.
These are new distributions for insurance modelling, and have to our knowledge not been considered before in computation and simulation studies. As said above, our objective is to execute numerical computations of the insurance risk factor, for which we consider using Monte Carlo simulations, the main reason being that there are no analytic expressions available. Thus, a main part of our paper deals with developing the simulation algorithms for generating samples from these distributions.
Also concerning the claim size distributions, we propose modelling by natural exponential families. Specifically, we consider • gamma distribution; • positive stable distributions; • inverse Gaussian distribution.
In this way, our aggregate models become Tweedie models in the sense that both the distributions of the counting number and the distributions of the claim size belong to natural exponential families Smyth, 2005, 2008;Smyth and Jorgensen, 2002). Hence, we shall investigate whether the statistical procedures for estimating the parameters in these models can be applied to our data, or whether we need to develop other procedures. Commonly one models the mean and dispersion in terms of risk factors, for instance by regression models or by generalized linear models (Smyth and Jorgensen, 2002). However, we propose to directly compute the risk as a tail probability of the aggregated claim distribution by executing Monte Carlo simulations.
The simulation algorithm exploits two efficiency improvements with respect to standard Monte Carlo. Firstly, the claim size distributions show the reproducibility property (Bar-Lev and Enis, 1986), which says that convolutions can be considered being transformations of univariates. Thus, for example, a single sample of the inverse Gaussian distribution suffices for generating a sum of i.i.d. inverse Gaussians. Secondly, we apply importance sampling by implementing a change of measure which is based on the well-known exponentially tilting the probability distributions (Asmussen and Glynn, 2007, Chapter VI). The optimal tilting factor is determined by a saddle-point equation, and results in a logarithmically efficient estimator.
The paper is organized as follows. Section 2 summarizes the concepts of Tweedie NEF distributions, and reproducibility. The main contribution of the paper is contained in Section 3 where we analyse the three counting distributions which leads to the construction of the simulation algorithms for generating samples. Section 4 summarizes a few aspects of the claim distributions. The aggregated claim risks are computed in Section 5 by Monte carlo simulation using the algorithms that we have developed. We show how these risks for large levels can be computed afficiently by an appropriate change of measure for importance sampling. Finally, Section 6 gives details of the data that motivated this work.
The generating measure ν is called also the kernel of the NEF. We may associate a random variable X θ with the NEF distribution F θ . Then Note that κ is invertibe, thus we obtain the NEF parameter θ by This means that if we let κ(m) = κ θ(m) , we can represent the NEF equivalently by where M is the mean domain of F and is given by M = κ (int Θ). Finally, if also the variance V (m) of a NEF distribution is given as function of the mean m, the pair (V, M) uniquely determines a NEF within the class of NEF's. Simple algebra shows that θ and κ(θ) can be represented in terms of m by where A and B are constants. Note that these functions are not unique at this stage. The constants A and B need to be chosen such that the corresponding F θ function is a probability distribution.
We call the NEF a Tweedie NEF when V (m) = O(m r ), i.e. a power law (Bar-Lev and Enis, 1986;Jorgensen, 1987;Tweedie, 1984). Furthermore, the following reproducibility concept will be a key element in our analysis. It has been developed in (Bar-Lev and Enis, 1986).
iid ∼ F θ . Denote S n = n k=1 X k . The NEF is said to be reproducible if there exist a sequence of real numbers (c n ) n≥1 , and a sequence of mappings {g n : Θ → Θ, n ≥ 1}, such that for all n ∈ N and for all θ ∈ Θ c n S n D ∼ F gn(θ) ∈ F.

Counting Distributions
In this section we analyse discrete counting NEF's that are given by a cubic VF (variance function), see Letac and Mora (1990). As said in the Introduction, we are motivated by data in a case study having a variance showing indeed such a power law. Our distributions will be used for computing the insurance risk factor by simulations, and, thus, the issue is how to generate samples from these distributions. Our analysis will lead to the construction of sampling algorithms that are based on the acceptance-rejection method. As dominating proposal distribution we can use the same distribution that is used to sample from the Zipf distribution (Devroye, 1986). We analyse the Abel, the arcsine, and the Takacs NEF's, consecutively. For each NEF we introduce the VF, develop relevant asymptotics, and then propose our simulatio procedure.

Abel NEF
The VF is given by By (3) we deduce that the NEF-parameter function θ A (m) and the cumulant function κ B (m) as functions of mean m are derived by The kernel is given in (Letac and Mora, 1990): Proposition 1. A = −1 and B = p.
Substituting g −1 = log and the expression for κ(m)/p in display (8), we get: Then, the NEF Abel counting probability mass function (pmf) of the associated counting variable N θ is: Conveniently we omit the NEF parameter θ in our notations when there is no confusion.

Analysis
First we consider an asymptotics of the modeified kernel ν 0 (n), using the Stirling approximation: where ∼ means that the ratio converges to 1 (for n → ∞). This gives The right-hand side shows correspondence with a Zipf distribution (Devroye, 1986): where ζ(·) is the Riemann zeta function. Sampling from this Zipf distribution is done by an accept-reject algorithm using the dominating pmf b(n) of the random variable U −2 : The multiplication factor for z(n) ≤ cb(n) is (Devroye, 1986) c = We show that we can use b(n) also as proposal dominating pmf for our NEF pmf f (n). However, because the domains of b(n) and f (n) differ (N versus N 0 ), we need a minor tweak. Denote the conditional pmf by f (n|n ≥ 1) = P(N = n|N ≥ 1).
Lemma 1. There is a constant C (not dependent on n) such that Furthermore, for n ≥ 1 using the lower bound n! ≥ √ 2πn n n e −n , we get: Moreover, clearly All together, where e −κ(m) = e p 2 /(p+m) .
Remark. Note that when m p, the constant C is of order p which is reflected in the acceptance probability 1/C in the accept-reject sampling algorithm. However, for large dispersion parameters p the larger constant C deteriorates this algorithm. In that case one might improve bounding the kernel and the probabilities. Our case study gave m p, so we decided to implement the bounding as given above. That gave acceptance probability 0.25.
Summarizing, the Monte Carlo algorithm for simulating from the Abel distribution (9) becomes: 10: until U < P 11: end if 12: return N .

Arcsine NEF
The VF is given by By (3) we deduce that the NEF-parameter function θ A (m) and the log-moment generating function κ B (m) are derived by The kernel is given in (Letac and Mora, 1990): Proposition 2. A = 0 and B = 0.

Analysis
In the Appendix we show that there is a constants K such that for n = 1, 2, . . .
Thus, for these even terms we recognize again the Zipf distribution. This will be helpfull to find a dominating proposal distribution.
Remark. Similarly to our algorithm for sampling from the Abel distribution, also the constant C becomes larger for larger p, deteriorating the accept-reject sampling method. In our implementation we included one more term in the bounding procedure that is described in the Appendix. This gave an acceptance ratio of 0.34.

Proof.
It follows immediately, The associated Monte Carlo algorithm for generating Takacs samples is similar as the Abel algorithm. The acceptance ratio in our case study is 0.23.

Concluding Remarks
These three counting distributions have tails that are much fatter than the more often used Poisson and negative binomial distributions. As an example, we consider in Section 6 a case study where the data show a mean m ≈ 70, and the variance V ≈ 52000, which indicates a power function V (m) ≈ m r with r ≈ 2.5. This was one of the reasons to consider our specific counting distributions. An important feature of these distributions is their large tails. Figure 1 shows the probability mass functions for 1000 ≤ n ≤ 1200. The Poisson probabilities in this region are virtually zero. The Abel and Takacs distributions behave in the tails equivalently, while the arcsine shows slightly lighter tails.
• Gamma given by kernel By inversion we get We observe that we actually deal with a Gamma distribution with shape parameter p and scale parameter 1 − θ, and thus generating samples can be easily done (Devroye, 1986).
. This is not the same as reproducibility, but the NEF shows the same property that the convolution can be represented by a single distribution.

By inversion we get
we recognize the more traditional form of the Inverse Gaussian pdf for which a simulation algorithm has been developed (Michael et al., 1976;Shuster, 1968).
Note that with this modeling the pdf f (y), y > 0 is only parameterized by index α, but it is not given in explicit form. However, it generates a NEF with a power VF ( Bar-Lev and Enis, 1986;Jorgensen, 1987;Tweedie, 1984) V (m) = am p , Also we obtain the θ and κ function of mean m and index α: Thus, given mean m and variance V (m) we compute the parameters θ and α for the NEF distribution with pdf f (y; θ, α) = f (y) e θy−κ(θ) , y > 0.

Computing Insurance Risk
The goal of our study is to compute efficiently the tail probability = P(S N > x) for large thresholds x, where S N = N j=1 X j is the random sum. We consider Monte Carlo simulation while applying two ideas: (i) the reproducibility, and (ii) importance sampling. The reproducibility ensures that given N = n has been generated or observed, we generate S as a single random variable in stead of a sum (convolution).
The standard Monte Carlo algorithm is trivial. Let M be the sample size, then the Monte Carlo estimator is where in the i-th replication, the counting number N (i) is generated from the counting distribution of interest (Abel, arcsine, or Takacs), according to the algorithms of Section 3. Given N (i) = n, the aggregated claim size S (i) n is generated from the claim distribution of interest (Gamma, inverse Gaussian, or positive α-stable) using the reproducibility property of Section 2. From the observations 1{S (i) N (i) > x}, i = 1, . . . , M , we compute the usual estimator and standard error (or confidence interval) statistics.
However, if the threshold x , we have difficulties in observing the event {S N > x} when we apply the standard Monte Carlo algorithm. As an illustration, let N be Abel and Y be inverse Gaussian, where the parameters are fitted by data in our case study of Section 6. The mean aggregate claim size E[S N ] ≈ 330. Because our distributions have large tails, we consider large levels x. As sample size we choose M so large that the standard error is about 10% of the estimate. We see in Table 1 that the required sample sizes grow exponentially with level x which means that very small probabilities are practically impossible to compute. x M std. error 5000 9000 1.08e-02 1.09e-03 10000 37000 2.59e-03 2.64e-04 15000 150000 6.47e-04 6.56e-05 20000 410000 2.37e-04 2.40e-05 25000 1020000 9.51e-05 9.66e-06

Importance Sampling Algorithm
The idea of importance sampling is to change the underlying probability measure of the stochastic system in such a way that more samples are generated from the target event. An unbiased estimator is obtained by multiplying the observations with the likelihood ratio. Denote the random variables that are generated in importance sampling by N and S, respectively. Suppose that N = n, and S = s are simulated, then the associated likelihood ratio is The importance sampling estimator becomes We have implemented the following importance sampling algorithm. Let the parameters of the counting distribution be (θ N , p N , m N ) (see Section 3), and of the claim distribution (θ Y , p Y , m Y ) (see Section 4). These parameters are fitted to the data, but note that the NEF-parameter θ follows from the mean-parameter m, and vice-versa, thus one of these suffices. For the change of measure we propose changing the NEF-parameter (and consequently the mean-parameter), but not the dispersion parameter p. In fact, we apply an exponential change of measure using a common tilting parameter, say θ * , for both the counting and the claim-size distribution. This parameter is obtained as follows. Let κ(θ) = log E exp(θS N ) be the cumulant generating function of the aggregated sum. Then θ * solves the saddlepoint equation κ (θ) = x; thus The interpretation is that under the change of measure the most likely samples of S N are generated around our target level x. It is well-known in the rare-event theory that such a change of measure yields a logarithmically efficient (or, asymptotically optimal) estimator in case of a fixed number of light-tailed claims, i.e. P(S n > x), see Asmussen and Glynn (2007, Chapter VI Section 2) or Bucklew (2004, Chapter V Section 2). However, by a conditioning argument one can show that the same holds true for a random sum. This means that the required sample sizes grow polynomially in level x, which we can clearly see in Table 2. Our algorithm contains a minor tweak in that after N has been generated, say N = n, we check whether E[S n ] > x.
In that case, we generate S n from the original claim distribution, and otherwise we apply the change of measure also for the claims.

Case Study
Data are available of claims at a car insurance company in Sweden in a specific year (Hallin and Ingenbleek, 1983;Smyth, 2011). The data consist of 2182 categories of 7 variables specifying per category: kilometres, zone, bonus, make, insured, claims, payment. Let I be the set of categories, with |I|=2182. For any i ∈ I, we model the random variables • N i : the number of claims in category i; • Y i : the claim size of a claimer in category i; • S i : the total amount of claims in category i.
The data give the numbers n i of claimers, and s i of total claim amount, they do not give the individual claim sizes.
For some subcatogories J ⊂ I we propose that the N j , j ∈ J are i.i.d. as N , and that the Y j , j ∈ J are i.i.d. as Y . Also we propose that N and Y are independent. Data available are n j , j ∈ J observations from N , and s j = n j k=1 y jk observations from S = N k=1 Y k given N .
Let θ N be the vector of parameters of the probability distribution of the counting variable N , and θ Y of the claim size distribution of Y . Due to the reproducibility property of Y , the distribution of sum S|(N = n) has the same parameter vector θ Y (and the given number n). For estimating these parameters we considered the two-moment fit method because all our distributions are derived from the mean and variance.
That is, let m N and v N be the sample average and variance of the counting data (n j ) j∈J . Then we fit a distribution for N such that For the counting distributions of Section 3 we get Similarly, let m Y and v Y be the sample average and variance of the claim data (y jk ) j∈J,k=1,...,n j . Then we fit a distribution for Y such that Note that the individual claim data (y jk ) are not observed, but that their sample average can be computed: And for the sample variance of the individual claims we use the well-known identity for the variance of the aggregated sum S = N k=1 Y k : With these parameters we have fitted the counting distribution and the claim distribution. Then we ran simulations of aggregated claim sizes in these models and executed the chi-square test for goodness-of-fit (hypothesising that the samples came from the same distribution). As an example, below we show the histograms of the data S N and the simulated S N in case of the Poisson-Gamma, Abel-IG and Arcsine-Stable combinations.   We may conclude that the arcsine counting variable with positive stable claim size gives the best fit. The computations of the risk probabilities in this model are easily implemented by the algorithms that we exposed in Section 3 for the arcsine samples, in Section 4 for the positive stable samples, and in Section 5 for the Monte Carlo and importance sampling simulations. Table 4 shows results for both standard Monte Carlo and importance sampling simulations.
Again we see the exponential versus polynomial increase of the required sample sizes. For levels x ≤ 25000 the estimates fall in their corresponding confidence intervals (in most caes), while for large levels, x > 25000, we have no Monte Carlo results. showed that both the random sum and the random claim size have variances as large as cubic powers of their means. For fitting distributions with cubic variance functions to the insurance data we used the NEF modeling. In this way we considered three discrete counting variables for fitting the random sum, and three positive continuous distributions for fitting the claim size, all coming from NEF's. We gave a thorough analysis of the nontrivial discrete counting variables for the purpose of developing sampling algorithms. These sampling algorithms are all acceptreject based, where the dominating proposal distribution is a Zipf distribution. Our claim size distributions are commonly known and sampling algorithms can be found in the literature.
Being able to sample from the aggregate claim distribution, we execute Monte Carlo simulations for computing tail probabilities, specifically for large losses. The efficiency of these simulations was improved by two techniques. The first being that the claim size distributions satisfy the reproducibility property implying that convolutions come from the same family as the individual distribution. The second improvement is the application of importance sampling. Our numerical experiments show that the exponential complexity of standard Monte Carlo is reduced to polynomial complexity.
The downside of our method is that the accept-reject algorithms of the counting distributions have acceptance ratio of 25%-35%. Therefore we shall investigate in future work the application of other sampling algorithms, notably MCMC and multilevel Monte Carlo methods.
Again we recognize the Zipf distribution, which will be usefull for an accept-reject sampling algorithm.