Minimal Generalized Extreme Value Distribution and Its Application in Modeling of Minimum Temperature

Distribution of maximum or minimum values (extreme values) of a data set is especially used in natural phenomena including flow discharge, temperature, wind speeds, precipitation and it is also used in many other applied sciences such as reliability studies and analysis of environmental extreme events. So if we can explain the extremes behavior via statistical formulas, we can estimate their behavior in the future. This article is devoted to study extreme values of minimum temperature in Tabriz using minimal generalized extreme value distribution, which all minima of a data set are modeled using it. In this article, we apply four methods to estimate distribution parameters including maximum likelihood estimation, probability weighted moments, elemental percentile and quantile least squares then compare estimates by average scaled absolute error criterion. We also obtain quantiles estimates and confidence intervals and finally perform goodness of fit tests.


Introduction
first introduced the family of extreme values distribution functions and Jenkinson (1955) first studied it and he noted the distribution when he was looking for a model for annual or daily maximum of natural phenomena such as maximum water level (or river height), amount of precipitation and air pollution. In many applied problems, extremal studies, that is, values in the tail of a distribution of some phenomena such as sea waves, temperature, earthquake, flood, compensations of insurance, concentration of atmospheric pollutants, material strength and etc. are interested. According to climate changes, both minimum and maximum values have enormous significance in some branches of sciences such as hydrology due to economic and social consequences and extent of damages related to climate events in recent years. For example, the minimum and maximum precipitation lead to drought and flood respectively (Galambos & Macri, 2002;Ferro & Seger, 2003). Consequently, in face of similar problems that are mentioned above have accelerated the modeling of extreme values in recent years. Thus, accurate prediction and modeling of the important climate factors such as precipitation and temperature are main aim of researches. At first, some basic definitions are illustrated: Definition Nondegenerate limit distribution of minimum X 1:n of a sample of size n drawn from a population with cdf F(x) are where c n and d n are a sequence of real numbers and L n (x) = Pr[X 1: (Fisher & Tippett, 1928;Tiago de Oliveria, 1958;Galambos, 1987). When this is possible, a given distribution, F(x), is said to belong to the minimal domain of attraction of L(x), if Eq. 1 holds for at least one pair of sequences {c n } and {d n > 0}.
The important result is that only one parametric family is possible as a limit for minima (Smith, 1990;Gomez & de Haan, 1999).
where κ is the shape parameter of the associated limit distribution. This implies that (a) if κ > 0, F(x) belongs to the Weibull minimal domain of attraction; (b) if κ = 0, F(x) belongs to the Gumbel minimal domain of attraction; (c) if κ < 0, F(x) belongs to the Frechet minimal domain of attraction (Galambos, 1987;Castillo, 1988;Castillo et al., 1989).
The remainder of the article is structured as follows: In Sections 2 and 3 the minimal generalized extreme value distribution (GEVD m ) and the maximal generalized extreme value distribution (GEVD M ) and their particular cases are presented respectively. In Section 4 the GEVD M parameters are estimated by four different methods. In Section 5 goodness of fit tests are described. In Section 6 extreme values of minimum temperature in Tabriz are analyzed. Finally, in Section 7 dicussions and conclusions are presented.

Minimal Generalized Extreme Value Distribution
It is usually used a general shape to show the extreme value distributions that is called the generalized extreme value distribution which is provided by Jenkinson (1955). In this case, an additional parameter is set in the distribution as shape parameter. Thus, the pdf of the minimal generalized extreme value distribution, GEVD m (λ, δ, κ), that is only limit distribution for minima in i.i.d. samples is given by where λ, δ and κ are location, scale and shape parameters respectively and the support is The reversed Gumbel, Weibull and reversed Frechet distributions are particular cases of the GEVD m . The shape parameter κ describes the behavior of tail of distribution and three types of the extreme value distributions (Weibull, Frechet and Gumbel) are derived based on the shape parameter κ that it may be positive, negative and zero. The corresponding p-quantile is Theorem 2 If the random variable X ∼ GEV D m (λ, δ, κ), then Y = −X ∼ GEV D M (−λ, δ, κ), and if the random variable X ∼ GEV D M (λ, δ, κ), then Y = −X ∼ GEV D m (−λ, δ, κ). By Theorem 2 parameter estimates for the GEVD m can be obtained using the estimation methods for the GEVD M to be introduced in Section 3. Thus, one can use the following algorithm (Castillo, et al., 2005): 1. Change the signs of the data: x i → -x i , i = 1, 2, . . . , n. 2. Estimate the parameters of the GEVD M , sayλ,δ andκ. 3. The GEVD m parameter estimates are then

Maximal Generalized Extreme Value Distribution
The pdf of the GEVD M (λ, δ, κ), that is only limit distribution for maxima in i.i.d. samples is given by where λ, δ and κ are location, scale and shape parameters respectively and the support is The Gumbel, reversed Weibull and Frechet distributions are particular cases of the GEVD M (Jenkinson, 1955). The corresponding p-quantile is

Parameters Estimation Methods For Maximal Generalized Extreme Value Distribution
In this section, we apply four different estimation methods including maximum likelihood estimation (MLE), probability weighted moments (PWM), elemental percentile (EP) and quantile least squares (QLS) to estimate the parameters and quantiles of the GEVD M .

The Maximum Likelihood Method
The loglikelihood function becomes where . In MLE method, the likelihood equations have no closed form solution and solving these equations are possible only with numerical methods such as Newton-Rophson method. One can use standard optimization packages in R software to find the estimates of the parameters that maximize the loglikelihoods in Eqs. 1 and 2. In order to avoid problems, one should do the following: 1. Use Eq. 2, if we believe that κ is closed to 0. 2. Use Eq. 1, if we believe that κ is far enough from 0 and check to make sure that conditions . . , n hold; otherwise, the loglikelihood will be infinite. Alternative to the use of standard optimization packages, one can use the following iterative method (Prescott & Walden, 1980;Castillo, et al., 2005). Accordingly, the (1 − α)100% confidence intervals for the parameters and quantiles are Remark 1 The above numerical solutions used to obtain the MLE require an initial estimate θ 0 = {λ 0 , δ 0 , κ 0 }. As possible initial values for the parameter estimates, one can use Gumbel estimates for λ 0 , δ 0 and κ 0 . A simple version of the EP or the QLS can also be used as an initial starting point. Also, the ML estimates have classical properties for the case when κ > − 1 2 but do not have these properties for the case when κ ≤ − 1 2 . Thus, it's suggested to use another method such as PWM (Castillo, et al., 2005).

The Probability Weighted Moments
The PWM method is a variation of the method of moments to estimate the parameters and quantiles of the GEVD M as proposed by Greenwood et al. (1979). It is convenient to use the β s moments of the GEVD M for the β 0. For κ ≤ −1, β 0 = E(X), but the rest of the β s moments do not exist. For κ > −1, we equalize the β s moments and the corresponding sample moments b s = m(1, s, 0) = 1 n ∑ n i=1 x i:n p s i:n , we obtain The PWM estimators are the solution of Eqs. 2-4 for the λ, δ and κ. Note that Eq. 4 is a function of only one parameter κ, so it can be solved by an appropriate numerical method to obtain the PWM estimateκ. This estimate can then be substitude in Eq. 3 and estimate of δ emerges as .
Finally,κ andδ are substituted in Eq. 2, and an estimate of λ is obtained as Remark 2  1999) or because the moments do not exist in certain regions of the parameter space. The MLE requires numerical solutions. For some samples, the likelihood may not have a local maximum. Another, perhaps more serious problem with the method of moments (MOM) and PWM is that they can produce estimates of θ not in the parameter space Θ (Chan & Balakrishnan, 1995). Here we use the elemental percentile method (EP), for estimating the parameters and quantiles of the GEVD M , as proposed by Castillo and Hadi (1995a). The method gives welldefined estimators for all values of θ ∈ Θ. Simulation studies by Castillo andHadi (1995b,c 1997) indicate that no method is uniformly the best for all θ ∈ Θ, but this method performs well compared to all other methods.

Elemental Percentile
This method obtains the estimates in two steps: First, a set of initial estimates based on selected subsets of the data are computed, then the obtained estimates are combined to produce final more-efficient and robust estimates of the parameters. The two steps are described below.

First Stage: Initial Estimates
The initial point estimates of the parameters are obtained as follows: For the case κ 0 , the GEVD M has three parameters, so each elemental set contains three distinct order statistics. Let I = {{i, j, r} : i < j < r ∈ {1, 2, , n}} be the indices of three distinct order statistics. Equating sample and theoretical GEVD M quantiles in, we obtain where p i:n = (i−0.35) n . Subtracting the third from the second of the above equations, and also the third from the first, then taking the ratio, we obtain where C i = − log(p i:n ) and A ir = C i C r . This is an equation in one unknown κ. Solving Eq. 6 for κ using the bisection method (see Algorithm in Castillo et al. 2005), we obtain κ i jr , which indicates that this estimate is a function of the three observations x i:n , x j:n and x r:n . Substituting κ i jr in two of the equations in Eqs. 5 and solving for δ and λ, we obtain The estimates κ i jr , δ i jr and λ i jr must satisfy that x n:n ≤ λ + δ κ , when κ > 0 and x 1:n ≥ λ + δ κ , when κ < 0 . In this way, we force parameter estimates to be consistent with the observed data.

Second Stage: Final Estimates
The above initial estimates are based on only three distinct order statistics. More statistically efficient and robust estimates are obtained using other order statistics as follows. Select a prespecified number, N, of elemental subsets each of size 3, either at random or using all possible subsets. For each of these elemental subsets, an elemental estimate of the parameters θ = {κ, δ, λ} is computed. Denote these elemental estimates by θ 1 , θ 2 , . . . , θ N . The elemental estimates that are inconsistent with the data are discarded. These elemental estimates can then be combined, using some suitable (preferably robust) functions, to obtain an overall final estimate of θ.

The Quantile Least Squares Method
The QLS method for the GEVD M estimates the parameters by minimizing the sum of squares of the differences between the theoretical and the observed quantiles. Thus, the distribution parameters can be estimated by solving the following equations using standard optimization packages in R software.

Goodness of Fit Test
There are different methods to study goodness of fit such as Kolmogorov-Smirnov test. In this article, we only performe the Kolmogorov-Smirnov test but it can be used some other methods for goodness of fit tests such as probability paper plots, Wald test and curvature test (Castillo et al., 2005).

Modeling of Minimum Temperature In Tabriz
Forecasting in climatology is one of the important issues that has attracted the hydrologists and statisticians. In this research, we analyze the observations of yearly minimum temperature in Tabriz  By comparison with the MLE in Table 1 one can see that, the QLS method's estimates are close to the corresponding MLE. Finally, MLE method is better from other estimation methods. By comparision the P-P and Q-Q plots in Fig 1-4, as would be expected, the graphs that exhibit the most linear trend are those for the QLS method because the method minimizes the difference between the theoretical and observed quantiles.  . P-P and Q-Q plots obtained from fitting the GEVD m to temperature data set using the QLS method As we showed in Section 4, to construct confidence intervals for the parameters, we also need the standard errors of the estimates. When we obtain them, 95 % confidence intervals for λ, δ and κ can be obtained as follows: λ ∈ (−13.6512, −13.6505), δ ∈ (3.4999, 3.5007), κ ∈ (1.4897, 1.4904), which indicates that κ is to be positive, hence supporting the hypothesis that the distribution of the minimum temperature data follows a Weibul domain of attraction at 5% significance level. The corresponding 95 % maximum likelihood (ML) confidence intervals (CI) for the 0.9, 0.95 and 0.99 quantiles of the GEVD m at α = 0.01, 0.05 significance levels are also given in Table 2.  Table 2. The 95% confidence intervals for the 0.9, 0.95 and 0.99 quantiles of the GEVDm at =0.01, 0.05 significance levels px p CI α = 0.01 0.9 -7.86 (-8.39,-7.32) 0.95 -3.95 (-4.47,-3.44) 0.99 6.86 (6.36,7.37) α = 0.05 0.9 -7.86 (-8.27,-7.45) 0.95 -3.95 (-4.34,-3.56) 0.99 6.86 (6.48,7.25) Based on Kolmogorov-Smirnov test, we obtain K − S = 0.1118 with P − value = 0.503 that the result confirms the fitted model.

Results
After analysis of the goodness of fit test, we conclude that the minimal Weibul distribution is an appropriate model for the minimum temperature data set of Tabriz. Also, obtained confidence inerval for the shape parameter κ confirmed the assumption that the minimum temperature data set follows the minimal Weibul domain of attraction. By comparision of the MLE, PWM, EP and the QLS estimates based on ASAE criterion show the MLE method is better than others. According to extremes temperature during the years 1951-2005, we can calculate the return period of minimum temperature because the temperature model of Tabriz (minimal Weibul) which is a special case of GEVD m , has been determined.

Appendix: Proof of Theorem 2
Let Y = −X, then we have