Probabilistic Modeling of Monthly Temperature Historical Series in Mossoró , Northeastern Brazil

We fitted the following seven distribution probabilities to the data of monthly average temperature in Mossoró, northeastern Brazil: Normal, Log-Normal, Beta, Gamma, Log-Pearson (Type III), Gumbel, and Weibull. To assess the goodness of fit the empirical distributions to the theoretical distribution, we applied the tests of Kolmogorov-Smirnov, Chi-square, Cramer-von Mises, Anderson-Darling, Kuiper, and Logarithm of Maximum Likelihood, at 10% of probability. The temperature series were obtained from 1970 to 2007. The Normal distribution provided the best fit to the historical series of average monthly temperature. Although the Kolmogorov-Smirnov test showed a very high level of approval, which generated some uncertainty regarding the test criteria, it is the more recommended to studies with approximately symmetric data and small series.


Introduction
Studies of the behavior of rainfall, air temperature, relative air humidity, evaporation, direction and speed of the wind, global solar radiation, the occurrence of dew, fog, hail, frost, and snow are essential tools to decision-making related to agricultural and human activities in building and tourism (Mota, 1981).Among these variables, the temperature is one of the leading factors for climatic characterization, because it exerts several effects in the agricultural practice (Araújo et al., 2010a).
The fitting of probability distributions to climatic variables over time aims to understand the meteorological phenomena, clarify their occurrence patterns, and make probabilistic forecasts.Good models provide reasonable predictability of the climatic behavior of a region, becoming critical tools for planning and managing diverse human activities.
A theoretical distribution is an abstract mathematical formula or characteristic shape.Some of these formulae emerge naturally as a consequence of certain kinds of data generating processes, and when applicable, these are mainly plausible candidates for concisely representing variations in a set of data.Even when there is no substantial natural basis behind the choice of a particular theoretical distribution, one may empirically find that such distributions suit very well a set of data (Pedrosa & Gama, 2004).
The simple construction of a frequency histogram for the visualization of a sample of data becomes insufficient to choose which function best represent the data among the several probability density distributions.Some criteria and goodness of fit tests help to verify if a known function suits the distribution of a dataset.There are several distributions of probability for discrete and continuous random variables.Among the ones used for discrete data, we find Bernoulli, Binomial, Negative Binomial, Hypergeometric, Geometric, and Poisson.For continuous variables, we can use the Uniform or Rectangular, Normal, Log-Normal, Gamma, Weibull, Gumbel, Exponential, Beta, Chi-square, t Student, and F Snendecor distributions, among others.
Goodness of fit tests, such as Kolmogorov-Smirnov, Chi-square, Cramer-von Mises, Anderson-Darling, Kuiper, Lilliefors, Shapiro-Wilk, and the Logarithm of Maximum Likelihood (Campos, 1983;Assis et al., 1996;Moretin & Bussab, 2004;Cooke, 1993), verify if the empirical values may come from a population with a particular theoretical distribution.In the tests, the null hypothesis states that the distribution is the one specified with their estimated parameters based on the sample data (Assis et al., 1996;Catalunha et al., 2002).The logarithm of the Maximum Likelihood shows a good quality of fit if its value is negative and the lowest possible (Cooke, 1993).
The use of appropriate probabilistic models provides mathematical precision, allowing consistent studies of historical data series.Arruda et al. (1981), assessing a 50 years series data of absolute minimum temperatures of June and July in the region of Campinas, defined and tested the distribution models of extreme values and normal distribution.They concluded that they both models suit the probabilities the dataset.These probabilistic models were also evaluated by Silva et al. (1986) to Lavras, in a historical series of 69 years of daily minimum temperature data for the months from April to September.The distribution of extreme values showed the best fit to the observed data.Camargo et al. (1990), using the model of extreme values for annual absolute minimum temperatures in several locations in the State of São Paulo and Mato Grosso do Sul, identified areas of frost risk.
Through suitable probabilistic or stochastic models based on historical series, the risk levels of absolute minimum temperatures and frosts can be estimated in different periods of the year.Some studies use empirical classification, employing the relative frequency of occurrence of minimum temperatures, to calculate the unconditional probabilities.According to Soares and Dias (1986), such method has the problem of sample size, which may be insufficient to obtain stable probability values.Conrad and Pollak (1950) suggest series of at least 30 years to achieve reliable results.
In a study of the occurrence of low temperatures in Campinas (SP), for the period from 1890 to 1975, Camargo (1977) considered temperatures under a meteorological shelter below 2.5 °C as typical of frost.Ortolani et al. (1981) showed the number of occurrences of minimum temperatures lower than 2 °C, under shelter, from 1962 to 1980, for eight localities of São Paulo.The use of 2 °C as a threshold was based on the average difference between the air temperature in the meteorological shelter and the grass temperature in frost nights, which is 5.6 °C (Fagnani and Pinto, 1981).Considering the air temperature of 2 °C, the leaf has a temperature of -3.6 °C, approaching the values found by Camargo and Salati (1967), Pinto et al. (1977), andPinto et al. (1978) as the limit for the appearance of damage in coffee trees.Using a relative frequency, with a series of twenty years of data, Soares and Dias (1986) defined for the city of São Paulo the probability of occurrence of daily minimum temperatures, lower than 10 and 15 °C.The maximum likelihood comprises the primary method to estimate the parameters of probabilistic models.This estimator has the four desirable properties of a good estimator: unbiasedness, consistency, efficiency, and sufficiency, which must satisfy the condition α > 0 (by definition) (Thom, 1958;Thom, 1966;Murteira et al., 2001;Catalunha et al., 2002;Bussab & Morettin, 2017;Casella & Berger, 2018).This work aimed to verify the goodness of fit of average monthly temperature series in the municipality of Mossoró, northeastern Brazil, to the following distributions probabilities: Normal, Log-Normal, Beta, Gamma, Log-Pearson (Type III), Gumbel, and Weibull.For this, we used the criteria of the Kolmogorov-Smirnov, Chi-square, Cramer-von Mises, Anderson-Darling, Kuiper, and Logarithm of Maximum Likelihood, given that the adjusted modeling is most appropriate for infinite populations of the historical series, thus providing relevant information for agriculture in terms of predicting climatic risk, estimating climatic event values and applying agricultural zoning throughout the municipality of Mossoró, RN, Brazil.

Material and Methods
Data of average monthly temperature (ºC) were used in a historical series of 38 years (1970 to 2007) of data records of the UFERSA (Federal Rural University of Semi-Arid) Meteorological Station, in Mossoró (5°11′ S-37°20′ W; 18 m of altitude).The region has a mean annual temperature of 27.5 °C and 68.9% of relative humidity (Carmo Filho et al., 1991).According to Köppen's climatic classification, the climate of Mossoró is of the type BSwh', that is, hot and dry.
According to Morettin and Bussab (2004), and Assis et al. (1996), the Normal (or Gaussian) distribution serves as a model for many real-life issues.It also appears in many theoretical investigations since many statistical techniques, such as analysis of variance and regression, assume or require data normality.
In the Log-Normal distribution, the logarithms of the random variables are normally distributed.In this work, we used the three-parameter distribution.
The Gamma probability function has two parameters, the shape (α) and the rate (β).Thom (1958), quoted by Miller and Weaver (1968), stated that for β values greater than or equal to 100, the gamma distribution approaches the normal distribution.The rate parameter (β) indicates the degree of dispersion between the data of a series studied.
The Weibull distribution is used in hydrological analyses for extreme events.However, its use in historical series of climatic and biological variables is still little known (Catalunha et al., 2002).The Gumbel probability density distribution is another model widely used in the representation of climatic data, such as for solar radiation and temperature (Assis et al., 2004).
The Log-Pearson Type III distribution is used to represent mean and extreme variables (Sansigolo, 2008).
When fitting a series of data to a probability density distribution, we work with the hypothesis that the distribution suits well that dataset.Some non-parametric analyses provide criteria for testing this hypothesis.We used the following tests assess goodness of fit: Chi-square (χ²), Kolmogorov-Smirnov, Cramer-von Mises, Kuiper, Anderson-Darling, and the Logarithm of Maximum Likelihood (Worley et al., 1990;Cooke, 1993).These statistics have the power to discriminate fitting in which the other tests fail to point out (Shapiro & Brain, 1981;Cooke, 1993;Campos, 1983;Siegel, 2006).Some ways of estimating the parameters of the Gamma distribution have been developed, contributing, along with its flexibility of shapes, to its use in several areas.However, the Maximum Likelihood is the primary method for estimating parameters, because its estimators have the four desirable properties of a good estimator: unbiasedness, consistency, efficiency, and sufficiency, which must satisfy the condition α > 0 (by definition) (Thom, 1958;Catalunha et al., 2002;Bussab & Morettin, 2017;Casella & Berger, 2018).

Results and Discussion
Seven models of probability distribution Normal, Log-Normal, Beta, Gamma, Log-Pearson (Type III), Gumbel and Weibull were fitted using the Kolmogorov-Smirnov, Qui-square, Cramer Von-Mises, Anderson Darling, Küiper and the Maximum Likelihood Logarithm according to Campos (1983), and Cooke et al. (1993), whose results indicated a good fit to these probability distribution functions, most of which were tested with a p 0.10 value.Similar results were obtained by Araújo et al. (2010), and Assis et al. (2004).Considering a significance level of 0.05, that is, with a probability of type I error of 0.05 (the incorrect rejection of a true null hypothesis), few datasets fitted the distributions.While using the Logarithm of Maximum Likelihood, none data series fitted the proposed models (Tables 1, 2, 3 and 4).These result occurred probably due to the little amount of data in the historical series, since the larger the historical series the higher the value of the statistic, i.e., the higher the logarithm of the maximum likelihood function, the better the goodness of fit obtained (Shapiro & Brain, 1981;Cooke, 1993;Sansigolo, 2008).According to Campos (1983), the Kolmogorov-Smirnov test is more suitable for extensive series.Moreover, this test dismisses the grouping of data in class intervals, as in the Chi-squared.So in our study, the Kolmogorov-Smirnov test is the most suitable.Thus, we may infer that among the seven distributions tested, the parameters of any of them could be used to represent the behavior mean fluctuation of the monthly temperature.However, the estimation of parameters of these distributions and the estimation of the probabilities differ in the degree of difficulty.Therefore, it is convenient to verify which of the functions studied has the best fit, which may be the one with less difficulty in obtaining the parameters and still easier in the estimates of probabilities (Cargnelutti Filho et al., 2004).
Table 1.Relative frequency of the number of fit of historical series of monthly average temperature to seven models of the distribution of probability density, Mossoró, northeastern Brazil Note. ¹AJ = Adjusted distribution; -= Unadjusted distribution (significant at 10% probability).5% ≤ p value ≤ 10%.
Among the seven distributions tested under a p-value of 0.10, the Normal distribution showed the highest number of fitting.It was evident through the Kolmogorov-Smirnov goodness of fit test, by estimating the parameters of this distribution (mean and standard deviation) for probability calculations within desired range intervals (Sansigolo, 2008).Araújo et al. (2010aAraújo et al. ( , 2010bAraújo et al. ( , 2011)), and Almeida et al. ( 2010) obtained similar results.On the other hand, Araújo et al. (2010bAraújo et al. ( , 2011) ) obtained temperature adjustments with the Gumbel, Normal, and Log-Normal distributions respectively.
It was verified, then, that the normal distribution was made in the average scale of 25, 50, 75 and 95% of trend of occurrence (Table 5), Percentage of equal or lesser degree of equal or less than those calculated, according to Table 5, there is a 95% probability that the value of the monthly average for the month of January is not higher than 34.8 ºC.Such value can be interpreted as if it were every twenty years, eighteen presenting a value less than or equal to 34.8 ºC.Since the reason for a precise diet is known, the choice of a favorable area can be made, that is, the average control of annual and diurnal temperature differences are almost all parts of the world.A soil surface, with or without vegetation, is the main receiver of solar radiation and atmospheric energy, and is also a radiation emitter.The network balance, which is variable throughout the day and year, promotes daily and actions in soil and air temperatures.Among the climatic elements, the propagation temperature of the species is great and the physiological processes occurring in animals and mainly plants (Valeriano & Picini, 2000).Growth and plant species are strongly influenced by this element, since agricultural activity is closely related to temperatures at the air temperature (Valeriano & Picini, 2000).Temperature is essential for climate and basic information for climatic zoning of all crops, as well as for characterization of local climates.The diversity index is directly linked to the temperature of a given region, for example, when rainfall occurs differently but is negatively influenced by agricultural production in the region.By means of the climatic water balance, with parameters of correctness, temperature and evapotranspiration, an available effect can be estimated in the soil and, therefore, the potential of the region for the agricultural crops (Hillel, 2003).
In medium-sized cities, such as Mossoró-RN, the densification caused by the verticalization, depending on the locality, may imply mesoclimatic changes, mainly, in relation to the island heat phenomenon.
As one of the natural components that undergoes more changes thanks to the process of urbanization is the climate, and Mossoró has peculiar climatic characteristics, as high annual temperatures to be located in an area of tropical climate equatorial also classified of semiarid does if necessary an investigation relating the local urban growth and the reflections of the thermal field (Mendonça & Dani-Oliveira, 2007).
According to Saraiva (2010) in his study on the climate of Mossoró, it has been proven that the city generates its own climate, resulting from the interference of all factors (little vegetation, large flow of motor vehicles, constructions etc.) that are processed on the urban boundary layer and that act to change the climate on a local scale.In the semi-arid region, this configuration becomes even more pronounced, since the high incidence of solar radiation causes in its cities temperatures quite high.

Conclusions
The average monthly temperature data of Mossoró, RN, Brazil, fitted the Normal, Log-Normal, Beta, Gamma, Log-Pearson (Type III), Gumbel, and Weibull probability density distributions.The Normal distribution showed the best goodness of fit.The Kolmogorov-Smirnov, Chi-squared, Cramer-von Mises, Anderson-Darling, and Kuiper fitting criteria were similar, and in this case, all can be used as indicators for model ranking.The Logarithm of Maximum Likelihood was a poor indicator for fitting.The Normal probability distribution performed well in estimating the occurrence of the average monthly air temperature.The construction of probability tables is a valuable tool in the study of the thermal behavior of this region.The city of Mossoró has a peculiar climate, predominating a great part of the year a dry season, presenting high temperatures, and another part of the year (autumn) that presents/displays cooler temperatures in the rainy period.The behavior of mean air temperature did not vary significantly over the 38-year series.No change was observed in the temperature values, which continue with the mean of 27.6 °C.

Table 3 .
P-values of the Chi-square and Kolmogorov-Smirnov tests for the fitting of seven distributions of probability density to the monthly average temperature series.Mossoró, Northeastern Brazil Month Probability distributions and respective P-values

Table 4 .
Classification of the fitting according to the Chi-square (χ²) and Kolmogorov-Smirnov (KS) tests to seven models of probability density distribution of the monthly average temperature series.Mossoró, northeastern Brazil

Table 5 .
Different levels of cumulative probability of occurrence of the average monthly air temperature, quantile value of the distribution and return period according to the Normal or Gaussian distribution for Mossoró, RN, Brazil Note.Return period or recurrence time (T): Average time measured in years when a given atmospheric event must be matched or exceeded at least once.