A Methodological Proposal Based on Artificial Neural Networks for Evapotranspiration Assessment

Evapotranspiration is the combined process in which water is transferred from the soil by evaporation and through the plants by transpiration to the atmosphere. Therefore, it is a central parameter in Agriculture since it expresses the amount of water to be returned by irrigation. Aiming to standardize Evapotranspiration estimate, the term “reference crop evapotranspiration (ETo)” was coined as the rate of Evapotranspiration from a hypothetical grass surface of uniform height, actively growing, completely shading the ground and well watered. ETo can be measured with lysimeters or estimated by mathematical approaches. Although, Penman-Monteith FAO 56 (PM) is the recommended method to estimate ETo by PM, it is necessary to register maximum and minimum temperatures (oC), solar radiation (hours), relative humidity (%) and wind speed (m/seg.). Some of these parameters are missing in the historical meteorological registers. Here, Artificial Neural Networks (ANNs) can aid traditional methodologies. ANNs learn, recognise patterns and generalise complex relationships among large datasets to produce meaningful results even when input data is wrong or incomplete. The target of this study is to assess ANNs capability to estimatie ETo values. We have built and tested several architectures guided by Levenberg-Marquardt algorithm with 5 above mentioned parameters as inputs, from 1 to 50 hidden nodes and 1 parameter as output. Architectures with 10, 15 and 20 nodes in the hidden layer brought outsanding r values: 0.935, 0.937, 0.937 along with the highest intercept and the lowest slope values, which demonstrate that ANNs approach was an afficient method to estimate ETo.


Evapotranspiration
Evapotranspiration (ET) is an important component of the hydrologic cycle.Its estimation plays a central role in different fields related to hydrology such as water balance, impact of land uses assessment, water resources planning and management and irrigation system design.Evapotranspiration is the physical process where water is transferred to the atmosphere both by evaporation from land and by transpiration from plants, so it is a combined process through which moisture returns to the atmosphere.
ET had always been a concept widely used as far as water resources management was concerned.There was still some ambiguity in the use of such terms as potential ET and reference crop ET though.In order to solve this issue, the Food and Agricultural Organization (FAO) of the United Nations and by means of the publication of the "FAO Irrigation and Drainage Paper No. 56" in 1990, shed some light on the problem, helping users with uniformity on the use of such terminology.
ET rate from a well-watered reference surface was called as "reference crop ET" or "Reference Evapotranspiration" and denoted as ETo (Allen & FAO, 1998).A reference surface, for the FAO must be a hypothetical surface "closely resembling an extensive surface of green grass of uniform height, actively growing, completely shading the ground and with adequate water".Doorenbos and Pruitt (1977) defined reference crop evapotranspiration rate as "the rate of evapotranspiration from an extensive surface of 0.8-1.5 m tall, green grass cover of uniform height, actively growing, completely shading the ground and no short of water".
A widespread method to estimate ET from a well-watered agricultural crop is to first estimate ETo and then to apply the appropriate empirical crop coefficient (Kc), which accounts for the difference between the reference surface and the actual crop.ETo can be directly measured using lysimeters by measuring different water balance components with highly construction and maintenance costs (Allen & FAO, 1998).Mostly, the use of lysimeters is considered as a time-consuming method and needs previous experience, so it is not a common choice on field researches.
A more affordable alternative to this method is the application of mathematical approaches based on several climate parameters, which are divided into empirical and physical models.The former are based on statistical functions of approximation between meteorological parameters and ETo values (Blaney & Criddle, 1950;Hargreaves & Samani, 1985;Jensen & Haise, 1963;Thornthwaite, 1948).
On the other hand, physical models are based on physical principles associated with the three most influential factors for ET: the amount of energy, the water vapor flux and the supply of moisture (Chow, Maidment, & Mays, 1988).The most representative of these methods is the Penman combined process (Penman, 1948).This approach relates evaporation dynamics to net radiation flux and aerodynamic transport characteristics of a natural surface.Observing that latent heat transference through plants is not only influenced by abiotic factors, Monteith introduced a surface conductance term that accounted for the response of leaf stomata to its hydrologic environment.This modified form of the Penman equation is widely known as the Penman-Monteith evapotranspiration model (Monteith, 1965).
Penman-Monteith FAO 56 equation has basically two advantages: First, it is applicable to a wide range of climates and local conditions with no need to be calibrated; second, it is a method previously validated using lysimeters.It also has a drawback, though; ET is a complex and nonlinear phenomenon due to its dependency on several interconnected climatological parameters, such as air temperature (Tmax, Tmin), mean relative humidity (HRm), wind speed (U 2 ) and insolation as sunshine hours (Ins).A very common problem among researchers on this field is that only temperatures have been broadly registered via weather stations since the fifties.In other words, historical meteorological datasets are usually incomplete.In order to solve this issue we propose the use of Artificial Neural Networks (ANNs) as a high performance tool at estimating ETo values, both with all inputs are available and when the dataset is incomplete.

ANN's and Its Application on Water Resources Field
Artificial Neural Networks (ANNs) are mathematical models whose architecture has been inspired by biological neural networks and are considered as very appropriate tools for modelling nonlinear processes.This is the case of Evaportanspiration, a process influenced by several climatical parameters which does not behave linearly and that justifies the methodology proposed on this paper and its suitability to it.ANNs are capable of learning from examples, recognising repeated patterns and generalising complex relationships among a large amount of data to produce eventually meaningful results, even when input data contains errors or is incomplete, which is the problem this study expects to solve.The main advantage of the ANNs is its ability of solving problems which are difficult to formalize (Sudheer, Gosain, & Ramasastri, 2005).ANNs allow us to capture deep complex characteristics of data (Galvão, Becerra, Calado, & Silva, 2004).
In terms of internal structure and operations, an ANN basically consists in three layers: an input, a hidden and an output layer, each of them composed by an array of processing elements (PE).A PE is a model whose components are analogous to the elements of an actual neuron.An ANN is a network where all its components in a given layer are interconected to all components in the next layer, but not between elements within the same layer.Input data is stored in the input layer, in fact, each input parameter gets into and it is stored in the ANN through a neuron or node.That is to say that each parameter is represented by a neuron.
The function of this first layer is to provide information to the network.At the entrance of the hidden layer, all input parameters randomly receive a weight, which ranks them in terms of importance according to the model the ANN is trying to simulate.This represents the first part of a processing element; the second part consist in a nonlinear filter, usually called "transfer function".Its aim is to limit the output values between two asymptotes.
The most common transfer function is the sigmoidal function.It is a function that varies gradually between two asymptotic values, typically 0 and 1 or -1 and +1.The hidden layer is actually which allows the network to model complex functions.The number of nodes compounding the hidden layer is determined by trial and error.There could be more than one hidden layer but the use of a single hidden layer along with a sigmoidal transfer function is widely recognised as the most frequent network topology (Cybenko, 1989;Hornik, Stinchcombe, & White, 1989).
Finally, the output layer can be understood as "the exit door" for values predicted by the network and it is composed only by one node, the output.The type of network topology described heres is known as Multilayer Perceptron (MLP) (Fausset, 1994).Within the ANN, as the sign spreads forward layer-by-layer (feed-forward) the error values propagate backwards (backpropagation) for a better adjustment of the weights (synaptic weight adjustments).Three different algorithms can be applied: Quasi-Newton (Q-N) (Haykin, 1994); Levenberg-Marquardt (L-M), (Hagan & Menhaj, 1994); Backpropagation with variable learning rate (BPVL) (Soares & Nadal, 1999).
All these studies are somewhat related to our proposal since they point out ANNs as useful tools at estimating ETo.Consequently, we also expect to get accurate ETo values when applying ANNs onto our incomplete dataset.
Here, the scope of our study is to demonstrate that ANNs are an efficient methodology to estimate ETo values at high performance bringing accurate results even when the dataset is incomplete, as it is in our case.

Study Area and Climatic Dataset
In order to fulfil the targets of this study, daily meteorological dataset for a 16-years period ranging from January 1 st , 2000 to December 31 st , 2015 was obtained from a weather station belonged to the INMET (Meteorological National Institute of Brazil), located in Juiz de Fora (Minas Gerais), in the southeast of Brazil (latitude 21°45′S, altitude 43°20′W, elevation 939.96).The location is shown in Figure 1.
The climate of the area has been classified as Humid Subtropical (Cwa) according to Köppen climate classification (Geiger, 1961;Köppen, 1884Köppen, , 1918Köppen, , 1936Köppen, , 2011)).The weather station corresponding to the World Meteorological Organization code 83692 and INMET code A518 is specifically located within the Federal University of Juiz de Fora (UFJF), where an average annual rainfall of 1536 mm. has been registered for the last decades.Based on INMET information and dataset registered, January is the most humid month in Juiz de Fora, with roughly 20% (298 mm.) of the accumulated rainfall, whereas August usually registers precipitations around 16.5 mm.As far as temperatures are concerned, February is the hottest month with temperatures around 26 °C as an average.Oppositely, July is the coldest month since the temperatures drop off to 20 °C as an average.The daily mean relative humidity does not oscillate much during the year as ranges from 74% (August) to 86% (July), with a high annual average of 82%.Wind speed is usually lowest through the first part of the year (January-June) and it ranges from 2.6 to 27 with an average of 8 m/seg.
In this study, we applied the PM equation on daily measures of some climatic parametres as maximum air temperature (Tmax), minimum air tempearture (Tmin), mean relative humidity (RHm), wind speed (U 2 ) and insolation as sunshine hours (Ins).It gets necessary, at this point, to draw readers' attention on what the use of daily measures implies for soil heat flux (G).The quantity of soil heat flux (G) gained and lost throughout a 24-hours period is assumed to be approximately the same, so, at the end of the day, G = 0. Rn, Δ, es and ea were calculated using the equations given by Allen and FAO (1998) and gathered in the FAO Irrigation and Drainage Paper No. 56; HRm, Tmax, and Tmin were collected from an INMET weather station located within the study area and were the substrate to calculate ea and es; γ was calculated from the altitude of the weather station under study.
Here, PM method has been presented as the standard and the most used methodology to estimate ETo but we proposed ANNs as an alternative method to estimate ETo starting from the same amount of data.In other words, ETo values estimated by PM method (ETo(PM)) were used as reference for its comparison with ETo values estimated by ANNs (ETo(ANN)).

Estimating ETo through ANNs
The same weather dataset encompassed within January 1 st , 2000 and December 31 st , 2015 was used to apply the different ANNs built in order to test its potentiality for estimating ETo.Each daily measure, which comes into the network through the input layer, is considered as a pattern of the climatic behaviour within the study area and it is from this dataset where the network learns from.
It is usual to find missing measures within large datasets.In our case, 14.55% of the total register was missing.Despite this, a total of 4994 daily measures (85.45% of the entire period) were available to perform the networks.Thus, the dataset was split into three subsets:  70% of it was destined for training; during this phase, the network is trained to associate outputs (ETo(PM)) with input patterns (Tmax, Tmin, RHm, U 2 , Ins).This process is usually known as Learning Process, as it requires a memorization process of the wide variety of input patterns and its associated outputs.


The 30% spare was divided into two equal parts and used for validation and testing, respectively; during validation, the network emulates what it learned before and tries to perform the best on new patterns.It is here where the most important application of neural networks takes place: Pattern Recognision; during this step, the network identifies the input pattern and tries to produce the associated output.The power of neural networks makes sense when a pattern that has no output associated with it, is given as an input (testing phase).In this case, the network gives the output that corresponds to a taught input pattern that is least different from the given pattern.That is to say that during the testing phase the network estimates ETo values basing on what it learnt previously during the Learning Process.

Building and Running an ANN
Firstly, we built ANNs with an input layer composed by 5 input parameters: maximum air temperature (Tmax), minimum air temperature (Tmin), insolation as sunshine hours (Ins), mean rlative humidity (RHm) and mean wind speed (U 2 ).Latitude (decimal degrees) and altitude (meters above sea) were excluded from the study as they were considered as constants.The function of this first layer is to provide information to the network.
Secondly, as far as the nodes that compound the hidden layer (Hi in Figure 2) were concerned, networks with 1, 2, 3,4,5,10,15,20,25,30,35,40,45 and 50 nodes were trained.The number of nodes in the hidden layer is a recurrent issue as there is no an "absolute truth" about it (Coulibaly, Anctil, & Bobee, 2000).Indeed, some studies on the issue revealed that large-than-necessary networks tend to over-fit the training samples and bring poor performances.Some researches demonstrated that one hidden layer is enough to represent the non-linear relationship between the climatic parameters and ETo (Arca, Beniscasa, & Vincenzi, 2001;Kumar et al., 2002).At the entrance of the hidden layer, all input parameters randomly received a weight (Wij in Figure 2), which ranked them in terms of importance accordingly to the model the ANN is trying to simulate.
Finally, the output layer was only composed by one node instead: daily ETo values estimated by PM method (ETo(PM)), considered as the target values.The complete architecture just described is depicted in Figure 2. To accomplish this task, the software Matlab R2015a was used, both for applying Penman-Monteith method to estimate ETo values and for building and running of ANNs.Aiming to assess the ANNs performance when estimating ETo values (ETo(ANN)), the following statistical indices were applied: This study began with the target of estimating ETo by means of another methodology instead of using the standard method, Penman-Monteith, as there were climatic parameters missing that did not allow the calculation of ETo.
INMET weather stations usually present non-registered periods or isolated blanks for some of the meteorological parameters monitored that can last from some days to entire years.Here, different ANN architectures were built, trained, tested with missing inputs.Once some inputs were removed (RHm, U 2 ), ANNs performances were compared with ETo(PM) by means of some statistical indices which allowed us to quantify the adjustment level between target values (ETo(PM)) and forecasting values (ETo(ANN)) as it follows: Where, c = performance index (Camargo & Sentelhas, 1997); d = adjustment coefficient (Willmott, 1981); r = correlation coefficient.

Results and Discussion
Different networks were built with a variety of nodes in the hidden layer (ANN architecture, Table 1) and the statistical indices described above are shown in Table 1 for each of them.MSE values acoount for the average difference between the target and the predicted values, so express the quality of an estimator, which is the ANN.
The closer to 0, the better is the estimator and its performance.Learning cycles refers to the number of trials the network performs to reduce the differences previously mentioned to a minimum.Architectures with 15 and 20 nodes in the hidden layer presented the highest r 2 values, 0.937 accompanied with a value of 0.935 for netwoks with 10 and 25 nodes in the hidden layer as is shown in Table 1.In other words, networks containing 10, 15, 20 and 25 hidden nodes could be considered as those reaching the minimum MSE values along with the highest r 2 values.In fact, except networks with 1 and 2 nodes in the hidden layer, all r 2 values obtained were above of 0.90 which is considered by most authors as an excellent ANN performance.
According to MSE values, networks with 2, 3, 4 and 5 neurons would be apparently the most appropriate for ETo estimates, as they present the lowest MSE values: 0.089, 0.074, 0.083 and 0.079, respectively.Furthermore, several authors used MSE as the criterion to assess ANNs topology (Jain et al., 2008;Kumar et al., 2002;Zanetti et al., 2007).
However, if we focus only on MSE values, the results obtained in this study and shown in Table 1 can lead us to a misunderstanding since the same networks showing the minimum MSE values also needed greater number of learning cycles and larger r 2 values.Although, 10, 15 and 20 hidden-nodes networks brought MSE values of 0.146, 0.157, 0.156, respectively along with less learning cycles required and much higher r 2 values, so these architectures were considered as the most suitable to estimate ETo and their performances are shown in Figure 3. Still focused on MSE values, we can check in Table 1 that an extraordinary value of 0.145 was obtained for a 30-hidden-nodes network.In concordance with it, the number of learning cycles required to reach the minimum error at validation stage gave us a raw idea of which ANN architectures neither over-fit the data nor underestimate the results.
As far as number of learning cycles are concerned, networks with 10, 15, 30, 40, 45 and 50 were ranked as the ones with less completed learning cycles: 10, 7, 6, 5, 6 and 7, namely.Again, 30-hidden-nodes network brought an unexpected result; 6 learning cycles were required to reach such a low MSE value as it is shown in Table 1 and clearly visible in Figure 4, A1.In addition, Slope (m) and Intercept (b) were also displayed in Table 1 so as to confirm the outstanding behaviour of 30-nodes network, which delivered remarkable values of 0.95 and 0.19, corresponding to the maximum slope and the minimum intercept among the architectures assessed.On the other hand, the r 2 value for this network (0.929) was actually the poorest among the architectures under study.Further analyses were carried out in order to determine wether 30-hidden-nodes network was, indeed, the best architecture or just an outlier.
After this unexpected performance of a 5_30_1 network, we carried out some further analyses.Training, validating and testing again the same 5_30_1 topology brought different results: now the network needed 36 learning cycles (6 times more) to reach the minimum MSE, which changed from 0.145 (Figure 4  Taking into account these results and probably requiring deeper analyses, we cannot consider a 30-hidden-nodes architecture as a trustworthy network, due to the poor Slope and Intercept values, the high MSE values and the large amount of learning cycles to reach it.
At this point, it gets necessary to remind that the main problem this study wanted to solve was to estimate ETo values using ANNs even when dataset is erroneous or incomplete.Until now, all the analyses carried out were applied on architectures with 5 input parameters: Tmax, Tmin, RHm, Ins and U 2 , so no paramaters were missing.
As mentioned in the introduction, RHm and U 2 were constantly lacking in the historical meteorological register used along this study.The lack of these two climatic parameters throughout our dataset was actually the reason why we considered ANNs as an approach to estimate ETo since Penman-Monteith method cannot be applied when there are missing inputs.As a secondary research, U 2 and RHm were removed from the input layer, as a one-at-a-time process in order to check wether it was still possible to estimate ETo at a high resemblance level.
The comparison brought outstanding c and d values: 0.929 and 0.976, respectively for a 4U 2 _15_1 architecture; 0.909 and 0.968 for a 4RHm_30_1 architecture; 0.890 and 0.961 for a 3_30_1 architecture.1_10/15/20/30_1 network was rejected due to its low r, c and d values.That means that an ANN with only three input parameters (Tmax, Tmin and Ins) can offer excellent adjustments (Camargo & Sentelhas, 1997) as it is visible in Figure 5C.
According to what it can be inferred from Figure 5 and confirmed by c and d values in  The importance of these resuts and its correct reading demonstrated that ANNs approach can be an efficient tool to estimate ETo which can also be applied for water resources management, irrigation schedule or crop irrigation systems design.Since this study has ETo as the leading role of the story and it is completely related to water, we thought that the reading of the results reached and showed above could be more clearly expressed and interpreted in terms of difference in water column height.With it, we meant to demonstrate that the amount of evapotranspirated water predictted by ANNs method was outstandingly similar to those estimated by PM method.Those differences can be checked in Table 3. Table 3. Mean daily target height: mean daily milimeters of water estimated by PM method for the whole study period.Mean daily predicted height: mean daily milimeters of water estimated by ANNs for the whole study period.Mean daily difference: difference in millimeters of water between Mean daily target height and Mean daily predicted height.Total difference: accumulated difference in millimeters of water for the whole study period.
As we can check in Table 3, Mean daily predicted height barely varies ± 0.0015 mm/day from ETo(PM) values, which is a negligible difference.Even the total difference is low, with a mean accumulated difference of 74.51 mm. for the whole period under study, which emcompassed 16 years; in other words, an annual mean difference of 4.657 mm.Now it is clearer to perceive the high accuracy of the method proposed throughout this paper.

Conclusion
The scope of this study was to assess how accurate Artificial Neural Networks could be at estimating reference crop evapotranspiration (ETo).To accomplish this task, different ANNs were built as it was previously explained and fed with the same number of inputs as those required by Penman-Monteith method, which is considered as the standard method to estimate ETo.After training, validating and testing the ANNs, the results delivered by them were compared with those estimated by PM method and the diferrent statistical indices pointed out ANN approach as a high accurate tool to estimate ETo.
The neural networks performing the most accurate results were composed by a single hidden layer with no more than 20 neurons in it, fitted with a hyperbolic tangent sigmoid transfer function and guided by the Levenberg-Marquardt algorithm.This architecture was sufficient to reach outstanding results, with a resemblance above 93%, considered as an excellent performance according to the classification presented by Camargo and Sentelhas (1997) and an average MSE of 0.13 mm/day.These results confirmed our main hypothesis: ANNs were able to estimate ETo values with a high accuracy.Previous studies on the field reached the same conclusion (Chauhan & Shrivastava, 2012;Jain et al., 2008;Khoob, 2008;Kumar et al., 2002;Landeras, Ortiz-Barredo, & López, 2008;Zanetti et al., 2007).
The issue that motivated us to conduct this study was the recurrent problem among researchers of facing incomplete date registers.We were not an exception.Newly, ANNs were built presenting some parameters missing in the input layer.The results obtained were also outstanding with correlation coefficients above 0.92, even when two parameters were missing.It confirmed the unpredicted capability of an ANN to reproduce at a high resemblance to the target model, in our case, the PM model, even with incomplete datasets.Translating these results into water column height, the average difference between predicted ETo and target values was ± 0.0015 mm/day and 4.657 mm/year, negligible differences in both cases.
Despite the excellent results obtained throughout this research some other questions arose, such as:


The possibility of training, validating and testing different ANNs with a particular dataset and then estimating new patterns from a different dataset, in those cases where the weather station has been out of order for long periods and its dataset presents great blanks.


Testing to what extent climatic conditions could limit ANNs application.Different locations will probably point out different climatic parameters as those more influent on ETo, due to the difference in altitude, radiation, wind speed or humidity conditions.In those cases, Local Sensitivity Analysis would be a useful approach in order to confirm such influence.


As ANNs approach demonstrated to be an efficient tool to estimate ETo accurately, further analyses may be focused on future forecasting as irrigation schedule planning and management tool for those public institutions in charge of dealing with water demand, licencing, infrastructure or services.


The chance of validating ANNs approach by using the forecast ETo values into other methodologies which are based on a water balance and allow us to model other physical processes where ETo plays a significant role.
Thus, deeper analyses and different approaches may be carried out in further studies.

Figure 2 .
Figure 2. Basic architecture of an Artificial Neural Network 4) Where, MSE = Mean Squared Error (mm/day); n = number of observations; x i = ETo (mm/day) estimated by PM (ETo(PM)); y i = ETo (mm/day) estimated by the ANNs (ETo(ANN)); r 2 = determination coefficient; x = mean of x i ; y = mean of y i ; σ x = standard deviation of x i ; σ y = standard deviation of y i ; Y = estimatted values (ETo(ANN)); X = target values (ETo(PM)); m and b = Slope and Intercept of the line of best fit between ETo(PM) and ETo(ANN).

Figure 3 .
Figure 3. Performances of 10, 15 and 20-hidden-nodes architectures.Performance in terms of MSE, numbers of learning cycles needed to reach the minimum MSE value and determination coefficient (r 2 ) from correlation analyses for (A) 5_10_1, (B) 5_15_1 and (C) 5_20_1 topologies , A1) to a value of 0.195 (Figure 4, B1), the highest among the architectures under study.The difference between the ETo(PM) and ETo(ANN) can be checked in Figure 4, A3 and B3.It is presented as a histogram where all patterns were clustered in 20 classes or bins around the line of zero error.It gave us an idea on how accurate ANNs can be, since most bins yielded a difference below 0.1 mm/day.

Figure 4 .
Figure 4. Unexpected behaviour of a 30-hidden-nodes architecture and results of further analyses.Comparison between (A) the first set of results of a 5_30_1 network and (B) further analyses.1) Number of learning cycles required to reach the minimum MSE value; 2) Correlation analysis showing r 2 , Slope and Intercept values; 3) ETo(PM)-ETo(ANN) histograms divided into 20 bins at three stages: train, validation and test pointing out the zero error point

Figure 5 .
Figure 5.General overview on the performance of networks with missing inputs.Effect of removing some input parameters in the performance of (A) 4U 2 _15_1; (B) 4HRm_30_1; (C) 3_30_1 topologies in terms of MSE, correlation analyses between ETo(PM) and ETo(ANN) and Error, which accounts for the difference between the target ETo(PM) values and the predicted values ETo(ANN) values

Table 1 .
ANNs performance indices 1 Note.* ANN architecture: They all were built based on this topology: In_Hi_Op, where In = parameters in the input layer; Hi = nodes in the hidden layer; Op = output.ETo(PM)-ETo(ANN): difference between ETo values estimated by PM method and ETo values predicted by the ANN.

Table 2 ,
ETo values estimated by ANNs are highly resembled to those estimated by PM method, even when both U 2 and RHm are not available.

Table 2 .
Performance indices from architectures with missing inputs Note. c: performance index; d: adjustment coefficient; r: regression coefficient.