Reinsurance Pricing of Large Motor Insurance Claims in Nigeria : An Extreme Value Analysis

Reinsurance is of utmost importance to insurers because it enables insurance companies cover risks that they, under normal circumstances, would not be able to cover on their own. An insurer needs to be able to evaluate his solvency probability and consequently, adjust his retention levels appropriately because the insurer’s retention level plays a vital role in determining the premiums he will pay to the reinsurer. To illustrate how Extreme Value theory can be applied, this study delves into modelling the probabilistic behaviour of the frequency and severity of large motor claims from the Nigerian insurance sector (2013-2016) using the Negative Binomial-Generalized Pareto distribution (NB-GPD). The annual loss distribution is simulated using the Monte Carlo method and it is used to predict the expected annual total claims and estimate the capital requirement for a year. Pricing of the Excess-of-loss (XL) reinsurance is also examined to aid insurers in optimizing their risk management decision in regards to the choice of their risk transfer position.


Introduction
In the face of extreme risks, an insurer can suddenly fall into ruins.This is because of the low-frequency, high-severity nature of such occurrences.To manage these types of risks, an insurer needs to be well equipped beforehand, having the required knowledge and tools that will enable him to accurately gauge the magnitude of losses that may be experienced and to put appropriate measures in place in case such an event takes place.Extreme Value theory (EVT) provides us with a solid theoretical foundation upon which we can build statistical models that describe extreme events.
As highlighted by Embrechts (1999), modelling claim-size distributions in non-life insurance; stress testing of catastrophe insurance portfolios; pricing of multi-line, multi-year, rare event (high layer) reinsurance products and pricing of catastrophe triggered bonds are some of the gains that can be made when EVT is applied.
Extreme modelling-based inference is a very valuable source of information in the insurers' decision-aiding manual.As stated by Pérez-Fructuoso and García Pérez (2010), the application of EVT to modelling large claims will help the insurer to correctly measure the risks and select the optimal capital requirements.Furthermore, he is able to price the policy appropriately and choose the most favourable reinsurance layer.This important point is re-echoed by Paldynski (2015) who argued that for the company to remain solvent in cases of extremely large insurance claims a combination of capital buffers and reinsurance must be present.This will help to absorb the unexpected losses.Even more important is the fact that such capital shortfalls have a significant impact on the company's existence, thus the need to assess the probability of such occurrences.On reinsurance, Paldynski (2015) gives some reasons for the need for it.It can be used to hedge against say, catastrophic weather in a given geographical region or as a means of sharing risks especially for policies that incur large losses.Reinsurance is also a way of reducing the variance of underwriting results and can enable an insurance company to increase their geographical presence in a regulated industry.Chasseray et al. (2017) adds that it helps the insurer to better monitor the risks faced.Embrechts (1999) give an introduction to EVT and showcase its use with the aid of examples involving financial and loss data.McNeil (1997) used EVT to model the tails of loss severity distributions while Rootzén and Tajvidi (1997) employed the EVT technique in a study of wind storm losses.Corradin and Verbrigghe (2001) estimated the loss frequency and loss severity of large fire claims.They also examined the impact of a quota and an XL reinsurance on the distribution of the total loss.An EV analysis of the Spanish motor liability insurance market data was carried out by Pérez-Fructuoso and García Pérez (2010).They show how the decision-making process, relating to the pricing of reinsurance, is optimized when considering a risk transfer position.Brazauskasa and Kleefeld (2016) modelled the severity and the tail risk associated with the Norwegian fire claims while that of Taiwan was fitted using the Generalized Pareto distribution (GPD) (Lee, 2012).
With reference to insurance companies or sectors within Africa, Wainaina and Watitu (2014) modelled Kenya's fire industrial insurance class of business using EVT.They estimated the VaR at different confidence intervals.Karobia (2015) modelled extreme claims for a selected insurance company using GPD.In the assessment of the Nigeria's motor industrial class of business by Adesina et al. (2016) using EVT, the main purpose was to compare the different methods for estimating VaR.Their results reveal that extreme VaR is the most suitable.This study contributes to literature specifically in the aspect of reinsurance pricing in which no study has been done so far using an African-based data, to the best of the author's knowledge.A simplified view of the concepts is given.The frequency and severity distributions used by Corradin and Verbrigghe (2001) are implemented in this paper and the XL rating under the NB-GP model for different quoted layers are computed.Quota reinsurance is not estimated.In contrast to the simulated dataset used by the authors, a real dataset is used in this paper to estimate the XL reinsurance.The Poisson-GP model was used by Pérez-Fructuoso and García Pérez (2010) and they compared the parametric and non-parametric estimation of the reinsurance risk premium.In this study, only the parmetric estimation procedure is shown.
The rest of this paper is structured accordingly: the theoretical framework is covered in section 2 and XL reinsurance in section 3. The data is analyzed and the results obtained are discussed in the 4th section.Finally, conclusions are made in the 5th section.

Theoretical Background
The foundations of EVT were developed by Fisher and Tippet (1928).Coles (2001) gives a very detailed presentation of the theoretical background of EVT.Classical EVT techniques consist of the Block maxima (BM) and the GPD approaches.This study makes use of the GPD method.

The Generalized Pareto Distribution and the Threshold Approach
This method considers more of the most extreme observations.This is one of the main advantage it possesses when compared to the BM approach.The threshold model is one of the methods for choosing what data to incorporate into the analysis for greater precision.This technique is due to Pickands (1975).Below the PickandsBalkemade Haan Theorem (here denoted as Theorem 1) is stated (see Embrechts, McNeil & Frey (2005) Theorem 7.20).This provides the theoretical justification that allows us to apply the GPD class of distributions when dealing with peaks over threshold modeling.
Theorem 1: Let X 1 , X 2 , ... be a sequence of independent random variables with common distribution function F and let Let us denote an arbitrary term in X i sequence by X and suppose that X satisfies the Generalized extreme value conditions so that for large n where convergence is in distribution.G is a non-degenerate distribution function given as for some µ, σ > 0 and ξ then for large enough u the distribution function of X conditional on X > u can be approximated as where H(y) is the GPD with the modified scale parameter σ u = σ + ξ(u − µ) corresponding to the excess of the threshold u. z + = max{z, 0}.For X > u we can set X = X − u and thus rewrite the equation as where In general we have the GPD as

Modeling Frequency and Severity: Negative Binomial (NB)-GPD Model
The GPD model alone accounts for the magnitudes of the extreme events but to obtain a more exact inference, it is important to take into account the times at which the events occur.Thus the large claims are modeled using a frequencyseverity approach.The NB-GPD model which is one of the ways to achieve this, combines the peak-over-threshold (POT) with the NB distribution.This model is closely connected to the claim process in the CramérLundberg model.The CramérLundberg model is a very popular model in early ruin theory -a study of how insurance companies are exposed to insolvency risks.
The GPD model is used to model the severity of the losses while the NB model is fitted to the number of losses above the selected threshold to model the frequency.The NB is used because for the sample provided, the variance is greater than the mean which implies that overdispersion is evident.Thus, using the Poisson model, in this case, may lead to an underestimation of the standard errors.A mix of the Poisson and gamma distributions gives the NB distribution.In essence, we assume that the Poisson parameter λ is gamma distributed making it very useful especially when incorporating parameter uncertainty into the Poisson parameterization.
The frequency-severity model falls under the framework of the loss distribution approach (LDA) where the claim severity is assumed to be independent of the claim frequency and they are both modeled separately.We therefore have the aggregate loss S as: X i is the ith claim (which is assumed to be strictly positive) and N is the number of claims.This is the collective risk model.The assumptions are as follows: the X i 's are independent and identically distributed (iid) and they are independent of N which is a discrete random variable representing the number of losses.
Quantities such as the distribution function, the expected value and the variance of the aggregate loss, S , are usually of interest to the insurer.Specifically in the case of reinsurance, the losses above a retention level R is considered.The distribution of losses above R is briefly discussed.

Distribution of Excess Losses
Drawing from Antal's (2009) study, one can determine the distribution for both the number of excess losses above R and the excess loss burden.
Proposition 1: Suppose the number of the excess losses be given by where A convenient way to prove this is to use the moment generating function (mgf) technique, and the two important properties that will be made use of are stated below.
Property 1: Let X and Y be independent random variables.Let Z be equal to X with probability p and equal to Y with probability 1 − p. Then the mgf of Z is Property 2: Let X 1 , X 2 , ... be a sequence of iid random variables.Let N be another independent random variable that takes nonnegative integer values.
).The mgf of N can easily be shown to be This means Y takes on the value 1 with probability π and 0 with probability 1 − π.Therefore mgf of Y is where Y i are iid random variables with distribution IP[.|X > R] that are independent of N R Proof.
Let the probability measure of model (N, X) be P and that of model (Y, N R ) be P.

Tail Risk Management Measures
A key issue in risk management is the definition of risk as well as the definition of relevant risk measures.The most commonly used risk measures are value at risk (VaR) and expected shortfall (ES).VaR describes the amount of extreme loss which is exceeded only with a certain small probability within a given time frame.
To obtain VaR using GPD, we consider a threshold u.The distribution of the excesses above u is for some underlying distribution F describing the series X i .0 ≤ y < x F − u. x F is the right end point of F. From Theorem 1, we know that F u (y) can be well approximated using the GPD distribution.That is, F u (y) H (ξ,σ) (y) as u → ∞.Thus the underlying distribution becomes for x > u and the empirical estimate for where n is the total number of observations and N u is the number of observations above the threshold u.Therefore, the tail estimate can be written as By inverting the tail estimate formula we obtain VaR The ES which can be described as the expected value of the size of the loss exceeding the VaR with some level of probability conditioned on the fact the loss actually exceeds the VaR is

XL Reinsurance
Let X represent the loss incurred and suppose we have an insurance policy with retention level R, then, the payment made by the reinsurer will be g(X) where Hence the reinsurance premium (RP) will be the expected value of the claims, E[g(X)].
When we factor in the frequency of claims into the computation, the expected total (aggregate) loss according to the classical risk theory hypothesis becomes Therefore under the NB-GPD model, it implies that: E(N) = rq p and E((g(X)) = E(H ξ,σ,u (x)) = E(H) where rq p denotes the average number of claims and E(H (ξ,σ,u) (x)) is the expected value of the GPD with parameters ξ, σ and retention u.We assume that the occurrence moments and the loss amounts fulfill the conditions of a compound NB process.That is, the collective risk model where N follows a NB distribution given by Parametrically, we will estimate the reinsurance risk premium as with dH (u) (x) = h (u) (x)dx being the density function of the GPD.To be more specific, the reinsurance layer may not coincide with the threshold u, of the optimized GPD, hence we have to compute IP(X > R|X > u).
We recall that the expected random amount (X − R) + which concerns the reinsurer can also be written as and since the mean excess function (MEF), e(R) is It implies that, in terms of the MEF RP = e(R) F(R) (3.9) where F(R) = 1 − F(R) is the survival function.In order to obtain a simpler expression for the integral of the reinsurace premium for an unlimited cover which is Furthermore, the asymptotic behavior of the tail properties provides us with the formula to estimate the tail probability when x → ∞ given that certain conditions hold.
when the tail is fitted with the GPD.From first principles, the formula of the MEF of the GPD is where k is the suitable choice made for the number of upper statistics, we can then estimate the net RP for an unlimited XL layer above R as Here, the exceedance probability has been estimated empirically by k n .However, if we have a bounded layer (R, R * ) with u < R < R * in excess of our threshold u, then and the explicit formula to compute the integral will be

Data Description and Analysis
The data used in this study are the large claims (3.5 million naira and above) of Nigeria's motor industrial class of business.
It covers the years 2013 to 2016.The four years of data give a total of 1,324 claims.Analyzing these motor claims will provide the insurance industry with useful insights given that, according to Ademunigbohun and Oreshile (2014), it (specifically, third party motor insurance) is the major driver of the non-life insurance market in Nigeria.Data is obtained from Nigerian Insurance Commission (NAICOM) at www.naicom.gov.ng.It can be found in the Nigeria Insurance digest for the specified years.
The histogram of the claims (on the log scale) is strongly skewed towards the right (Figure 1).This suggests that there is a frequent occurrence of smaller motor claims while larger losses take place occasionally.From the plot we can observe that more of the losses fall between 3.5 and 6 million naira.Inflation is not taken into account in this analysis.

Figure 1. Histogram of claims on the log scale
In Tables 1 and 2, the summary statistics of the claims are shown.It reveals that more claims were paid in the year 2015.The general trend is encouraging.It shows that the industry is growing given that such large amounts of claims are being paid yearly.,245,000 21,765,289 19,000,000 21,802,500 20,124,610 23,302,148 19,885,000 23,900,000 23,479,720 24,752,550 51,810,400 28,215,000 Sample size 244 286 419 376

Fitting the NB-GPD Model
The shape of the plot (Figure 1) and the increasing MEF (Figure 2 right) indicate that the data is heavy tailed.Thus the sample is fitted to a GPD distribution to model the severity of claims.First we determine the threshold u.A useful tool for this is the mean excess plot (MEP) as seen in Figure 2. In the case of the GPD, the MEF of a random variable X is a linear function in u (Embrechts et al., 2005).In the Figure 2 (plot on the right) the plot is fairly linear (has a positive slope which is confirmed by the positive shape parameter, ξ in Table 3) up to the 8 million mark approximately (indicated by continous vertical line) and then, begins to waver.The wavering is due to the lack of data available as regards very huge losses.The lowest point from where the plot becomes linear, using the eyeball approach, is chosen as the threshold.This is set at 3.8 million as indicated by the dotted vertical line on the MEP and the dotted horizontal line on the scatterplot of the claims.This leaves 1128 losses as the number of excesses (nexc) to fit the GPD to (Table 3).The parameters are estimated using the maximum likelihood method with the respective standard errors given in the braces.
The diagnostic plots (Figure 3) show four different plots.The quantile (probability) plot has most of the points falling on the 45 o diagonal line, the return level plot (a plot of the return level against the return period) has a central line representing the return level for the fitted model and two outer lines representing the pointwise 95% confidence limits and most of the points which are the empirical return levels (from the dataset) fall within the confidence limits.All these indicate that GPD is a good fit.The density plot is simply the fitted probabilty density function (GPD) superimposed on a histogram that has been generated using the data.The p-values for the statistical goodness of fit tests -Cramer-von Mises (CvM) and Anderson-Darling (AD) tests are 0.31 and 0.21 respectively (significance level, α= 0.05).This further confirms that GPD is a good fit.To model the frequency of the claims using the NB, the number of losses per year above the given threshold is taken.The estimated parameters are displayed on Table 3.We can easily obtain the probability of success, p, for the NB distribution from the size and mean parameters with this simple formula p = size size + mu = 24.25 24.25 + 282 = 0.079 Figure 3. Diagnostic plots to assess the GPD's goodness of fit for the historical motor claims (see brief explanation of these plots in Section 4.1, paragraph 3)

Risk Measures
After fitting the data to GPD, the VaR and ES values are estimated together with their respective confidence intervals (CI).It indicates that 99% of claims that occur result in a loss below 18,990,729 naira, and in the case where a claim results in a loss larger than this amount, the expected loss will be 28,180,720 naira (Table 4).VaR and ES estimates for the historical claims with confidence intervals.

Simulating the Annual Loss Distribution
The annual aggregate loss distribution is simulated using the Monte Carlo technique.This involved making use of the parameter values that were obtained from the frequency and severity analysis above.The implemented algorithm follows that of Piacenza (2012) annual loss distribution R algorithm with the difference being that the NB is used here to obtain the frequency distribution.Random numbers of size n=100,000 are generated from the NB distribution with mean mu=282.
Similarly, the severity distribution is generated by drawing the same number, n, from the GPD using the estimated parameters in Table 3.Then, for each random number extracted from the frequency distribution say, n i = 2, two random numbers (in this case, losses) are also extracted from the severity distribution and the sum of the extracted losses is gotten.
That is, S = ∑ N i=1 X i where N ∼ NB(r, p).This process is repeated n times to obtain the annual loss distribution.To visualize how the distribution looks like, a histogram depicting this is shown in Figure 4b.Simulated individual losses are also shown (Figure 4a).From the annual loss distribution, the expected overall annual loss (EL) and its VaR can be computed.The EL is the quantile of the annual loss at the 0.5 mark, this gives 1,715,027,838.It is depicted by the dotted line in Figure 4b.This means that on average the motor insurance industry should expect a total of about 1.7 billion naira worth of claims in a year.The 99% VaR for the annual loss is 2,012,204,962.Thus the allocated or required capital will be gotten by deducting the EL from VaR resulting in 297,177,124 naira.

Computing the XL Reinsurance Risk Premium
After selecting a priority level R, an insurer may not want to insure all the risk that exceeds that level rather he may choose a layer of reinsurance corresponding to the interval (R, R * ].Hence, to replicate this sort of scenario, 3 different layers of limited coverages are assumed.Also, one XL with an unlimited cover (20mill, ∞) is also considered.The 3 levels of limited-cover reinsurance are 2mill xs 10mill (10,12]mill, 3mill xs 12mill (12,15]mill and 5mill xs 15mill (15,20]mill.
Here mill represents million.
In Table 5, the survival probabilities of exceeding the lower limit in each layer and their respective reinsurance net premium results are displayed.Assuming that only large claims above 3.5 million naira are in the portfolio, there is a 0.02 (2.4%) probability that the claims will exceed 15 million naira given that it has exceeded 3.8 million naira (table 5, third row).In other words, approximately 980 out of the next 1000 claims (1 − 0.0237 = 0.9763) that exceed the 3.8 million naira threshold will remain under 15 million naira.This implies that roughly about 20 of these claims will exceed 15 million naira costing the motor class of business on average, about 1.9 million naira net premium to have a reinsurance cover in excess of 15 million naira with a limited cover of 20 million naira.
The trend indicates that the higher the layer of reinsurance, the lower the probability of survival and the lower the net premium the insurer will pay.This will imply less profit for the insurer if he picks the lowest layer and although the probability that the losses exceeding his retention limit will be higher, his probability to ruin will be smaller.

Conclusion
This paper showcases how powerful EVT is for modelling the tail of very large losses.Extreme losses were analyzed using a frequency and severity based approach because the survival of an insurance company critically depends on this.It was shown that the GPD served as a good fit to the tails of the motor claims as indicated by the AD and CvM tests.The expected annual total claims for the motor insurance industry was estimated based on the simulated annual loss distribution.
The importance of having good estimates of the tails of the claims was made evident in the pricing of the XL reinsurance from the perspective of the insurer.Solvency probabilities were calculated for the considered layers of reinsurance coverage to help the insurer in deciding whether to assume or cede certain risks and to have an idea of the risk premium he will be paying in the case of ceding.These analyses bring to the fore a major issue constantly faced by an insurer when deciding on his retention level, which is, balancing the trade-off between profit and security.This can serve as a scope for further study.
Another interesting direction for future research will include taking inflation into account and studying its effect on the pricing process.The pricing of different reinsurance structures and their various combinations based on the simulated loss distribution is also a research area one can examine.

Figure 4 .
Figure 4. Simulated individual losses (a) and annual loss distribution (b) using the parameters obtained from the historical claims

Table 1 .
Summary statistics

Table 2 .
The 3 largest claims and sample size in each year

Table 3 .
The estimated parameters based on the fitted NB-GPD

Table 4 .
VaR and ES at the 0.99 quantile

Table 5 .
XL reinsurance layer and premium