Height-Diameter Models for Araucaria angustifolia ( Bertol . ) Kuntze in Natural Forests

The height-diameter relationship of Araucaria angustifolia trees in different sociological positions (dominant, codominant, dominated) was evaluated in a native forest in the south of Brazil, aiming to find accuracy in its estimation and its use as a component of forest description, growth and yield. The total number of trees of the three sociological positions was 657. Part of these trees of each sociological position was used to estimate the parameters of models, and the remaining for model evaluation. Thus, the objective of this work was to find the best height estimate using nonlinear models, linear with dummy variable, principal component with nonlinear regression, and principal component with mixed nonlinear regression. The criteria for accuracy of fit were adjusted coefficient of determination, root mean square error and mean error. The results showed that the fit using principal component with mixed nonlinear regression obtained consistent results and better accuracy. It showed that height growth capacity depends on the sociological position.


Introduction
The tree height-diameter relationship is an important component in yield estimation, stand description, and damage appraisals (Zhang, 1997).Total height (h), and diameter at breast height (1.3 m above ground)-including bark-are the two most essential variables in most forest inventories (Huang et al., 2000).However, it is difficult to measure these variables for all trees in a forest, and it is common to adjust hypsometric equations to estimate the height of non-measured trees.
The height-diameter curve is not a strong biological relationship in both native and planted forests, because in many cases, tree trunks present an unusable or deformed portion, which generates high diameter values and low height values, and the opposite is also true (Hess et al., 2014).
Growth and yield models are generally used to predict the temporal development of forest stands.Knowing the diameter at breast height (DBH) and total tree height is fundamental for both developing and applying many growth and yield models.DBH of a tree can be quickly, easily, and accurately measured, but the measurement of total tree height is relatively complex, time consuming, and expensive.Furthermore, tree, stand, and site conditions may prevent accurate height measurements on all trees measured for DBH as it may not be possible to unambiguously observe a given tree, or reach an appropriate vantage point.Therefore, with many permanent and temporary sample plot systems, DBH is conventionally measured for all trees sampled, but height is measured for only a sub-sample of trees selected across the range of diameters observed (Sharma & Parton, 2007).
The hypsometric curve will show a great variability in height for the same diameter when constructing a curve for different sites or ages.The hypsometric relation usually results in estimates with low coefficient of determination and high root mean square error due to the high variability in height for the same diameter class in stands or forests of old age (Machado et al., 1994).
The development of simple and accurate height-diameter models based on easily obtainable tree and stand characteristics is a common precursor to the use of inventory and sample plot data to calculate volume and other stand attributes.However, the relation between the diameter of a tree and its height varies among stands (Calama & Montero, 2004) and depends on the growing environment and stand conditions (Sharma & Zhang, 2004).Thus, it is a challenging task to choose a mathematical technique of fit and a model that can accurately describe the variability of height-diameter relationships for Araucaria angustifolia (Bertol) Kuntze trees in a native forest.An accurate fit of this variable is necessary to estimate forest volumes and describe the forests and their development over time (Huang et al., 2000).Some functions may produce significant errors by extrapolation when applied beyond the range of the model development data.This is aggravated when the sampled trees are from young or middle-aged stands.Therefore, the models' predictive capabilities (accuracy, precision, time dependence, biological realism, and flexibility) must be carefully evaluated and validated before a long-term forest growth and yield simulator is used (Zhang, 1997).
The objective of this work was to find the best equation to describe the height of individual Araucaria angustifolia trees of dominant, codominant, and dominated sociological positions, using different techniques of mathematical adjustment of models, and based on precision statistics, indicate which model describes the height-diameter relationships with more precision.

Study Area
The study was conducted in a rural property in Lages, in Santa Catarina, Brazil (27°48′S; 50°19′W, and altitude of 987 m) with A. angustifolia trees in competition in a natural forest.The local climate is humid subtropical, without dry season, with temperate summer (Cfb), according to the Köppen classification.The local average annual temperature is 15.2 °C, and average annual precipitation is 1685.7 m (Alvares et al., 2013).The predominant soils in the region are Haplic Nitosol (Oxisol), and Humic Cambisol (Inceptisol), developed from basaltic rocks (Embrapa, 1999).

Data Collection
Data of 286 A. angustifolia trees from the 657 trees collected in a natural forest were adapted to hypsometric models, as used by Costa et al. (2014).Statistical techniques that would improve the fit and precision of existing height-diameter models were applied, aiming to validate the old model, as well as a new model using 259 trees that were collected and used for data validation.External data from other sites were used to evaluate the performance of the height-diameter model developed in the validation step using data of 112 trees collected in a natural forest by Klein (2017).This procedure was carried out to ascertain the generalization capacity of the model in terms of height estimation in other places nearby the study area (Figure 1).(SP 1 : dominant, N = 53; SP 2 : codominant, N = 41; SP 3 : dominated, N = 18).The data showed large variation in diameters and heights among the sociological positions (Table 1).

Nonlinear [NL]
The Michailoff nonlinear model [NL] (Equation 1) was chosen because of its well explained data trend of height-diameter in A. angustifolia trees (Costa et al., 2016).The model was adjusted according to the groups of sociological position of trees (SP 1 : dominant, SP 2 : codominant, and SP 3 : dominated) (Costa et al., 2014).
Where, h is total tree height (m), DBH is tree diameter at breast height (cm); β 0 and β 1 are the estimated parameters.

Linear With Dummy Variable [LDV]
The linear model with dummy variable [LDV] (Equation 2) applied in the present study was used by Costa et al. (2014) in a natural forest to stratify the sociological position of trees.
Where, h is total tree height (m), DBH is tree diameter at breast height (cm), D 1 and D 2 are dummy variables to stratify trees by the codominant and dominated social positions, respectively.The equation has R 2 adj.69.86% and root mean square error (RMSE) 1.9567.

Principal Component With Nonlinear Regression [PC-NLR]
The principal component with nonlinear regression [PC-NLR] (Equation 5) was developed in three stages.The first stage was the optimization of the value of coefficient η 0 (Equation 3), that typifies the level (β 0 ) of Equation 1, fitted according to the sociological position.
Where, h is total tree height (m), DBH is tree diameter at breast height (cm), β 1 is the slope coefficients defined by sociological position (see Table 3, [NL] SP 1 = 17.7633;SP 2 = 11.6296; and SP 3 = 7.9829), and η 0 is the value obtained for each tree.
The second stage was the determination of the linear regression with principal component (Equation 4, see Table 3, PC-NLR-coefficients of regression: α 0 = 23.1129;α 1 = 1.9362; α 2 = -0.6472;α 3 = -3.7318).The use of linear regression with principal component can be beneficial where modeling is used with quantitative and qualitative variables simultaneously.In addition, a multicollinearity problem can be solved using the principal component regression (Montgomery, 2006).The modeling of regression coefficients in h/d models (β 0 -level and β 1 -slope) were tested at plot level using multiple linear regression as a function of stand variables (Adame et al., 2008), being the complete model later adjusted.Similarly, in the present study the value obtained for each tree (η 0 , according to the first stage in Equation 3) was used by modeling the principal component regression in function of variables at tree level.We evaluated the influence of variables: diameter at breast height (DBH-quantitative), height to crown base (HCB-quantitative) and social position (SP-qualitative) with regards to the level (η 0 ) of model.
For the third stage, we used the result of Equation 4(η 0 ) obtained from trees in (Equation 5), which was symbolized by (δ 0 ).Thus, Equation 5was fitted and new regression parameters were defined (e.g.see Table 3, PC-NLR-estimated coefficients: ξ 0 = 1.1512 and ξ 1 = 9.4520).This was designated principal component with nonlinear regression [PC-NLR].Therefore, a unique model with biological realism of estimated coefficients of regression was obtained, since the linearization of a nonlinear model generates changes in its mathematical structure and adds bias.
Where, K 1 , K 2 , K 3 are eigenvectors; e.g.see Table 3  and K 3 followed the same procedure; SD = standard deviation; α 0 , α 1 , α 2 , α 3 are the estimated coefficients of linear regression with principal component, α 0 = 23.1129;α 1 = 1.9362; α 2 = -0.6472;α 3 = -3.7318;h is total tree height (m); DBH is tree diameter at breast height (cm); HCB is the height to crown base (m); SP is sociological position of the tree in the forest; δ 0 is the result of the regression with principal component; and ξ 0 and ξ 1 are the estimated coefficients.

Principal Component With Mixed Nonlinear Regression [PC-MNLR]
In this model, we used the result of Equation 4 (η 0 ) obtained from trees, referring to item 2.3.3 in stages 1 and 2, according to (Equation 6), which was symbolized by (δ 0 ).Thus, Equation 6 was fitted and new regression parameters were defined (e.g.see Table 3, PC-MNLR-coefficients estimated: θ 0 = 1.0062 and θ 1 = 12.3262; where SP 1 -u k1 = 5.1354; SP 2 -u k2 = -0.2625;SP 3 -u k3 = -4.8947).This was designated principal component with mixed nonlinear regression [PC-MNLR].The PC-MNLR was used attributing random coefficients to stratify the trees according to their sociological position because of its easy calibration.The introduction of random coefficients in slope model (θ 1 ) was tested.However, in the Plenterwald system, the height curve does not change because the social position of trees of certain diameter classes remains the same (Prodan, 1944).The height-diameter trend in A. angustifolia trees was investigated using the nonlinear model with inclusion of quantitative variables: diameter at breast height (DBH), height to crown base (HCB), and a qualitative variable represented by the sociological position of trees in the model.Since the height curve is very similar to the growth development curve, and the height curves of normal-management classes are similar to those of uneven-aged forests (Loetsch et al., 1973).
Where h is total tree height (m), DBH is tree diameter at breast height (cm), δ 0 is the value of regression with principal component, θ 0 and θ 1 are the estimated coefficients of fixed effects, and u ki is the estimated coefficient of random effects that specify the sociological position.

Criteria for Height-Diameter Model Evaluation
The adjusted coefficient of determination (R²adj.),root mean square error (RMSE), and mean error (e mean ) were evaluated (Table 2).The graphical residual distribution (%) as a function of DBH was assessed.The accuracy of the estimation models for the data reserved for validation was based on the t-test (α = 0.05).All statistics were processed with the Statistical Analysis System (SAS-9.2) (SAS Institute 2011).
Table 2. Criteria used to evaluate the height-diameter estimation models n -p e mean = y -y n Note. y i = observed value for the i th observation; y i = predicted value for the i th observation; y i = mean of the y i ; n = number of observations in the dataset; p = number of estimated parameters.

Results
The h and HCB variability among the sociological positions and their means are shown in Table 1.This variability indicates the importance of separating the data for fitting to the hypsometric function (Costa et al., 2014), since the mean for the set of trees is close to the mean of dominant trees.However, for the other sociological positions, the variable will be overestimated when using the mean of the set of trees.Differences in h and HCB between sociological positions showed a standard deviation ±3 meters, denoting differences in height growth, amount of light entering the canopy, density and competition of trees in the forest.
The models were adjusted for each sociological position (NL-Equation 1), the NL model tended to explain less the variability in the codominant and dominated sociological positions, and had greater RMSE (less precision), while the adjustment explained 71.0% of the variance of the dominant position.These results denote the difference in height growth associated with competition for light and ability to compete, thus indicating a great difference in the height-diameter ratio.The height growth of trees in the dominant position stabilized and reached the same level in the canopy (similar heights), presenting better fit and precision in the estimation of the variable.The adjustment of the NL model was satisfactory regarding the accuracy with the validation data (Table 3) according to sociological position, showing that estimates did not differ from those observed by the t-test (SP 1 : p = 0.378635 ns ; SP 2 : p = 0.368907 ns and SP 3 : p = 0.754448 ns ).
The model with Dummy variable (LDV-Equation 2) explained 70.0% of the variance in data used for fitting and 42.0% in validation data (e.g.see in Table 3), but its p-value was significant by the t-test (SP 1 , SP 2 and SP 3 : p < 0.011326), showing that the predicted values differed from those observed, confirming the results in Figures 2  and 3, what evidences the low performance of the model.The use of a linear model with categorical dummy variables to stratify the sociological positions (Costa et al., 2014) showed the variance inflation factor (FIV) higher than 5, indicating that the model presents collinearity for the estimators (Montgomery et al., 2006).It is worth noting that the multicollinearity problem can be solved by other statistical methods such as the use of principal component analysis, nonlinear model, ridge regression, among others (Hoerl & Kennard, 1970;Montgomery, 2006).
The effect of social position on the intercession and slope of the regression curves using covariance analysis, based on the Dummy variable (LDV) in Table 3, and according to the study by Costa et al. (2014), showed significant values for the variables D1 (codominant) (p = 0.0256) and D2 (dominated) (p < 0.0001), indicating differences between levels of codominant and dominated sociological positions in relation to dominant trees.The adjustment of the model with Dummy variables must have a restricted use due to the occurrence of collinearity.In this sense, observing the mean error for the individual models, it is preferable to use the NL model to estimate height instead of models with Dummy variable (LDV).
The use of principal component analysis in models of regression can be beneficial where modeling is used with quantitative and qualitative variables simultaneously.In regression models with principal components, they are explanatory variables that are not correlated, a factor that eliminates the problem of multicollinearity (Draper & Smith, 1981).A high correlation of DBH and HCB was evidenced by the correlation matrix, indicating that the larger the diameter, the greater the height to crown base (Table 3).The negative SP value indicates that it decreases with increasing height (β 0 -level in equation 4), i.e., dominant (values of 1 for SP 1 ), codominant (values of 2 for SP 2 ) and dominated (values of 3 for SP 3 ) trees.DBH and HCB are the variables with greater weight, being more influential for the first principal component (PC1).In the same relation, for the second principal component (PC2), is SP.With PC1 (62.93%) and PC2 (27.92%), it is possible to explain 90.85% of the total variance of data.The fitting by linear regression with principal components in Equation 4[see Table 3, PC-NLR coefficients of regression: (α 0 = 23.1129;α 1 = 1.9362; α 2 = -0.6472;α 3 = -3.7318)]showed values of R²adj.= 0.7488 and RMSE = 1.9392.
The inclusion of the quantitative variable HCB and the qualitative variable SP in the principal component with nonlinear regression model (PC-NLR-Equation 5) showed a similar fitting performance to that of the Dummy variable model (LDV) in Table 3.The PC-NLR explained 68.0% of variance of data used for fitting and 52.0% of validation data (e.g.see Table 3),but its p-value was significant by the t-test (p < 0.000042), evidencing low accuracy in estimates, with the predicted values different from observed.
The best estimates were obtained with the adjustment of the principal component with mixed nonlinear regression (PC-MNLR-Equation 6), which explained 86.0% of variance using data of model adjustment and 73.0% with validation data (Table 3).According to the t-test, estimates did not differ from those observed in the validation data (p = 0.658524 ns ).The graphical analysis of residual distribution in the PC-MNLR model was better than that of LDV, these results can be observed in the data used for the fitting (Figure 2) and in the data used for the validation (Figure 3).The PC-MNLR model had lower RMSE and homogeneous residual distribution for all sociological positions, concentrating its distribution around 20.0%, and reducing discrepant points.
The results showed that the LDV model is biased and the PC-MNLR model is the most suitable for estimating a height of araucaria trees for this forest typology.The estimate values using the equation of PC-MNLR with external data show no differences from those values observed and predicted by the t-test (p = 0.95886) and evidences high accuracy of equation in the prediction of total height of araucaria trees (Figure 4).Therefore, the inclusion of the quantitative and qualitative variables promoted the generation of uncorrelated coefficients (multicollinearity) using the principal component, the PC-MNLR model has good generalization capacity with better accuracy to describe data.When the independent variables necessary to use the PC-MNLR are not measured, it is recommend using the individual NL models.These do not show significant differences by the t-test between observed and estimated data for the positions: SP1, SP2 and SP3.

Discussion
This study compared different fitting techniques of regression models aiming to improve the height-diameter relationship estimation for different sociological positions of A. angustifolia in a natural forest.The PC-MNLR (Equation 6) predicted the height of a tree in the forest requiring the measurement of DBH, HCB and their classification by SP with high accuracy of estimates (Table 3).
The variable HCB represents crown recession (Russell et al., 2014), which is an easier measurement variable depending on the density of the forest.Thus, this model has improved the height-diameter relationship to define the vertical and horizontal structure of the forest, which shows a different development according to age, density, sociological position, competition and site.
Several factors affect the tree height-diameter relationship, especially competition, age, and site (Adame et al., 2008).High-accuracy models for tree height estimates include variables that characterize the forest stand such as age, mean square diameter, basal area, density, and dominant height.
At early ages, codominant and dominated trees of A. angustifolia invest more in height growth due to the lack of light in the crown and, consequently, present smaller increment in diameter.As they age, height growth stabilizes, causing a decrease in their vigor, which is more evident in codominant and dominated individuals (Costa et al., 2014).
The PC-MNLR change in height for a given diameter level is not the same for all sociological positions and height to crown base.Based on this evaluation, the height-diameter relationship is dependent on tree sociological positions in the forest.Therefore, estimating the height of trees using a simple height-diameter model for data set will lead to errors.
The determination of fixed and random effects in a model is a flexible decision, subject to debate, and all parameters in the model must be first considered mixed when the convergence is possible (Sharma and Parton, 2007).According to Calama and Montero (2004), predicting random parameters for a plot using complementary observations of height increases the predictive ability of the model for all trees.Fang and Bailey (2001) suggest that the parameters with high variability and less overlapping in confidence intervals, obtained by fitting each individual plot separately, must be considered mixed if the convergence is not achieved when considering all the parameters as mixed.
In mature forests, competition among trees creates different extracts, sociological positions, and variation in diameter and height, allowing a more stable hypsometric curve over time, by associating different diameters for the same height, since the height-diameter relationship can be affected by external factors such as density and sociological position (Machado et al., 2008).
The low biological relationship of height and diameter is evident due to height variability depending on the sociological position, which generates incoherent fitting of equations depending on forest typology, site conditions, and forest density; thus, testing different models and regression techniques is necessary to increase the accuracy in estimating this variable.PC-MNLR is flexible and can be used in other sites, and locations with the same forest typology and species.This was observed in the residual distribution of the model, since they are more centralized near the zero (e.g.see Figures 2, 3 and 4).It is important when we use the total tree height variable for production planning and forest management, as it assigns greater precision to volume estimates.
In general, the model shows biological coherence and might help reduce prediction bias, particularly at the extremes of any dataset (Rijal et al., 2012).The PC-MNLR model outperformed LDV in all sociological positions, especially in the codominant and dominated classes, since the greatest variation for a given height was due to the variability among sociological positions.

Conclusion
The data showed the relation between dendrometrics and categorical variables.Categorical variables were used in the model to describe the social position of trees and their variation in height depending on their increase in diameter.
The proposed PC-MNLR model showed better accuracy for height estimates according to the sociological position of the trees.The use of random effects in the sociological position had easy application of the model in the field.The model showed good generalization in predicting tree heights using external data, confirming its potential for this purpose.
There is a need for silvicultural interventions in trees that grow at low diameter rates, and density regulation of trees in forests with selection of promising trees of better social position and with well-formed and vigorous crowns.

Figure 2 .
Figure 2. Residual distribution of the best-fit equations for the total height data of Araucaria angustifolia trees

Note. a
Coefficients estimated and defined as described byCosta et al. (2014); b definition of Dummy variable, D 1 = codominant trees; D 2 = dominated trees.ns non-significant (α = 5%); SP = sociological position of tree in the forest; SD = standard deviation.

Figure 3 .
Figure 3. Residual distribution of the best-fit equations using the validation data for the variable of total height of Araucaria angustifolia trees Figure 4.

Table 1 .
Summary statistics for model fitting and validation data

Table 3 .
Performance of different models and fitting techniques to estimate the height-diameter relationship of Araucaria angustifolia trees