Selection of Morphoagronomic Descriptors in Physalis angulata L . Using Multivariate Techniques

This study aimed at selecting determinant morphoagronomic descriptors to characterize and evaluate Physalis angulata L. germplasm. Twelve quantitative and twenty-two qualitative descriptors were analyzed in six accessions of P. angulata coming from the physalis germplasm collection belonging to the State University of Feira de Santana-BA. The selection and discharge of quantitative descriptors was based on the direct selection and on the Singh method, while qualitative descriptors were analyzed through entropy. The statistic analyses were carried out using the GENES and R programs. Ten quantitative descriptors were excluded through direct selection and five through the Singh method. However, only four descriptors were considered redundant by both methods: east-west fruit, weight of five ripe fruits, width of leaf blade and total soluble solids. Although the total soluble solids descriptor was appointed for discharge, it was included in the group of descriptors selected due to its importance in the characterization of physalis fruit. The list of minimum descriptors to describe physalis accessions comprised 15 descriptors: plant height, stem diameter, north-south fruits, number of fruits per plant, leaf blade length, internode length, fruit longitudinal length, fruit transversal length, total soluble solids, growth habit, stem color, leaf margin shape, unripe calyx color, unripe fruit shape and color. These were nine quantitive and six qualitative descriptors, respectively. The discharge of 55.88% of the descriptors did not cause significant loss of information and might allow the reduction of time and resources spent to characterize and evaluate physalis germplasm.


Introduction
The organization and maintenance of species genetic resources, considering the activities of collection, characterization and conservation, are essential to sustain the genetic basis of genetic improvement programs (Silva, Moura, Neto-Farias, Ledo, & Sampaio, 2017).Characterization is a vital activity to generate knowledge about germplasm, preserved in bases or collections, for providing better management of accessions and When the number of characters is high, it is possible that some of them are redundant and do not contribute much for the discrimination of the individuals under analysis, since these properties are usually correlated (Daher, Moraes, Cruz, Pereira, & Xavier, 1997), or might be neglected for representing a non-representative fraction of the total variation (Alves, Garcia, Cruz, & Figueira, 2003).Therefore, the use of a large number of descriptors might result in an increase in work without improving the accuracy of the characterization, in addition to making the data analysis and interpretation more complex.Moreover, the elimination of redundant descriptors which are difficult to measure becomes desirable in order to facilitate studies, reducing experimentation time and costs, without, however, affecting the reliability and quality of the results (Pereira, Vencovsky, & Cruz, 1992).
Multivariate analysis are useful tools to identify the more informative and the less relevant descriptors in the characterization and evaluation of a germplasm to be used in genetic improvement programs, since it supplies information to exclude descriptors that represent little contribution to the total variation (Cruz, Carneiro, & Regazzi, 2004).The use of more than one procedure to discard redundant characters has been adopted to provide higher security to the selection of descriptors (Castro, Neves, Jesus, & Oliveira, 2012;Afonso et al., 2014;E. Oliveira, F. Oliveira, & Santos, 2014;Silva, Carvalho, & Duarte, 2013;Silva et al., 2017).Thus, other methodologies have been used in the evaluation of the discharge efficiency such as the Singh Method (1981), based on the Mahalanobis distance (D²), which considers less important the characteristics that present lower variability (Chagas, Alexandre, Schimildt, Bruckner, & Faleiro, 2016).
Physalis angulata L., family Solanaceae, is an annual herbaceous species that reaches up to one meter in height, and a calyx formed by five sepals surrounds its fruit (N.Sultana, Hassan, Begum, & M. Sultana, 2008).This plant produces fruits with high nutraceutical value and quality characteristics that favor its fresh consumption , as well as its potential for commercial cultivation and use in the food industry (Oliveira, Martins, Vasconcelos, Pena, & Carvalho, 2011;P. Silva, S. Silva, J. Silva, Mendonça, & Pereira, 2018).In addition, the species outstands for the production of secondary metabolites with great medicinal potential (Tomassini, Barbi, Ribeiro, & Xavier, 2000).
Techniques to select descriptors were used with species of the Solanaceae family such as tomato (Gonçalves, Rodrigues, Amaral Júnior, Karasawa, & Sudré, 2009) and Capsicum spp.(Ortiz, Flor, Alvarado, & Crossa, 2010;Sudré et al., 2010;Silva et al., 2013).However, even with the evidente importance of P. angulata, no studies were found in the literature about the discriminating capability of the descriptors or the suitable number of descriptors able to discriminate the accessions.Therefore, more information is needed that might guide further studies, both in the area of genetic resources and the improvement of plants of this species.
For this reason, the objective of this study was to select relevant morphoagronomic descriptors to characterize and evaluate P. angulata germplasm.

Vegetable Material
Six P. angulata accessions were used, originated from the Germplasm collection belonging to the University State of Feira de Santana (UEFS), Feira de Santana, Brazil.Such accessions came from collections carried out in Bahia and Piauí, and from which, the three accessions coming from Anguera have already been submitted by Silva (2007) and Araujo (2012) to three autofecundation selection cycles (Table 1).

Study Area
The experiment was carried out in the period from April to September 2017 at the Unidade Experimental Horto Florestal (Forest Garden Experimental Unit) (12°16′087″S; 38°56′346″W; 243 m altitude), belonging to the State University of Feira de Santana (UEFS) in Feira de Santana, BA.According to Thornthwaite and Mather classification (1955), the region has a sub-humid climate, megahermic (C2rA'a'), with an annual rainfall average of 848 mm and an annual mean temperature of 24 ºC.During the summer, the region can reach an average monthly temperature of 27 ºC, and during the winter, 21 ºC (INMET, 2017).The soil of this region is a Plantasol Haplic Salic (Dias, Souza, Oliveira, & Santos, 2010).

Morphoagronomic Characterization
The experiment was carried out on an experimental field, in complete randomized blocks design, with three replications and a useful plot of six plants, totalling 18 plants per treatment.The space between rows was 1.0 m and 0.8 m between plants.To form the seedlings, three seeds were sowed in 300 mL disposable plastic cups, containing the commercial substrate TechnsVivato.Thinning was carried out 15 days after seeding, keeping only the most vigorous plants.
Later on, the plants were kept in screen house with daily manual watering early in the morning and late afternoon.When the seedlings were around 20 cm high, they were transplanted to open field conditions, with drip irrigation.The adjustment of the fertilizing process was carried out based on the chemical analysis of the soil (Appendix A) and following the recommendation for the tomato crop.For the pests control, it was used neem oil, which was applied with manual sprayer at the beginning of the plant development.
The evaluations started two months after the transplanting, from the flowering.Morphoagronomic descriptors, proposed by the National University of Colombia, (González, Torres, Cano, Arias, & Arboledo, 2008), were used with adjustments for the species.The physiologically mature fruits were randomly collected at each accession.
The mature stage was evidenced by the intensification of the yellow color of the calyx.The fruits were packed in plastic bags, labeled and taken to the Laboratory of Molecular Genetics at the Horto Florestal Experimental Unit in Feira de Santana.At the laboratory, the fruits were separated from the calyxes, washed in running water, and dried.Then, the evaluations were carried out.
Twelve quantitative and twenty-two qualitative descriptors were used (Table 2).It is important to state that, for the fruit number analysis, besides the descriptor "number of fruits per plant", two other descriptors were included, "fruits of the north-south" and "east-west" axis, due the fact that some plants lacked branches in a specific orientation.Qualitative variables were carried out based on the phenotypical class of the descriptors for P. angulata (González et al., 2008) and according to the color catalogue by the Royal Horticultural Society (The Royal Horticultural Society, 2001).Quantitative and qualitative descriptors were analyzed in five leaves and fruits of six plants from each accession, totalling 18 plants evaluated per accession.
Table 2. Quantitative and qualitative descriptors for the characterization of six accessions of P. angulata from UEFS.UEFS, Feira de Santana, BA, 2018

Descriptor Methodology
Plant height (PH) It was measured from the bottom to the top of the main branch with a measuring tape (cm).
Stem diameter (SD) It was measured from the bottom to 5 cm from the soil with a digital caliper (mm).
North-south fruits (NSF) It was determinated the number of fruits from the north-south axis in four branches.
East-west fruits (EWF) It was counted the number of fruits from the east-west axis in four branches Weight of five ripe fruits (RFW) It was calculated the mean of 5 randomly picked ripe fruit, using a precision scale (g)

Number of fruits per plant (NFP) It was determinated the number of fruits per plant
Leaf blade length (LBL) It was calculated the mean of the leaf length of 5 randomly picked leaves, from the base to the top of the blade, with a millimeter ruler (cm).
Leaf blade width (LBW) It was calculated the mean of the width of the leaf blade, on the base, of 5 randomly picked leaves, using a millimeter ruler (cm).

Entrenode length (EL)
Mean of the length between 5 randomly picked nodes, using a measuring tape (cm).

Fruit longitudinal lenght (FLL)
The mean of the longitudinal axis of 5 randomly picked ripe fruits was measured with a digital caliper (mm).

Fruit transversal length (FTL)
The mean of the transversal axis of 5 randomly picked ripe fruits was measured with a digital caliper (mm).

Total soluble solids (TSS)
The mean of the total soluble solids content of five random fruits selected at random, expressed in ºBrix, was determinated by using a drop of fruit pulp extract with a digital refractometer with the temperature correction at 20 ºC.

Qualitative descriptors
Descriptor

Selection of Quantitative Descriptors
The data obtained through the quantitative characters were analyzed using descriptive statistics, position measurements (mean) and dispersion measurements (maximum and minimum values, standard deviation and variation coeficiente).In addition, the Shapiro-Wilk Normality test was employed.
The recognition of redundant descriptors was carried out through two methods: (1) Direct selection (Jolliffe, 1972(Jolliffe, , 1973)), which is indicated to discard any descriptor that presents the highest weighting coefficient in absolute value (eigenvector), in the principal componente of lowest eigenvalue, from the last component to that whose eigenvalue did not exceed 0.70; (2) Selection according to the Singh coefficient (1981), taking into consideration the relative contribution of each of the characteristics to the genetic divergence.
Considering the coincident information in both techniques, the final discharge of descriptors was carried out, excluding those suggested as redundant in both methods.The Pearson correlation coefficients were calculated, among all descriptors, to verify the association between the discarded descriptors and the remaining ones, helping the discharge decision.The significance of the correlation coefficient was verified through the t test.

Selection of Qualitative Descriptors
For the qualitative characters, the percentage frequency of each of the classes and the entropy level of the descriptors were calculated using the Renyi entropy coefficient (Renyi, 1961).The higher the number of phenotypical classes and the more homogeneous the balance between the accessions frequency in the different phenotypical classes are, the higher the entropy value of any descriptor is (Vieira, Fialho, & Faleiro, 2007).Descriptors presenting level of entropy 0.00 were discarded.
The statistical analyses were aided by the programs GENES (Cruz, 2013) and R (R Core Team, 2013).

Results
Table 3 shows the amplitude of the values presented from the descriptive statistics of the quantitative characters investigated.The coefficient of variation (CV) oscillated between 8.78% and 68.21%, corresponding to the fruit transversal lenght (FTL) and number of fruit per plant (NFP), respectively.Restrepo and Vallejo (2003) estimated the genetic variability in tomato accessions, and obtained a coefficient of variation equal to 82.34%, 30.82% and 30.04% for the number of fruits per plant, average fruit weight and plant height, respectively.The high coefficient of variation recorded for the NFP character represents a high genetic variability characteristic of the evaluated material.Thus, the CV values found suggested a wide variation between the results, presenting variability and heterogeneity between the accessions, and a possibility to be explored in breeding programs.
The variable number of fruits per plant (NFP) showed the greatest variation, whose difference between the maximum and minimum values was 436.57, presenting a 188.96 mean, while the leaf blade width (LBW) presented the least variation between maximum and minimum values (2.96 cm and 5.12 cm, respectively), with a 3.94 mean (Table 3).Moreno, Fischer and Sánchez (2012) obtained results that confirm the ones found in this study, with the greatest variation for the variable number of fruits per plant (NFP), in 54 accessions of a collection of Physalis peruviana L. germplasm, originated in the central and northeast regions of Colombia.
Regarding the normality test, most of the variables were seen to present normal distribution, since the variables were not significant in the Shapiro-Wilks test with 5% and 1% significance levels (Table 3).Based on the Singh coefficient (1981), the characters that provided greater relative contributions regarding the genetic diversity of the accessions were north-south fruits (NSF), with 20.14% contribution, followed by the descriptors of stem diameter (SD), with 17.23%, entrenode length (EL), with 11.60%, leaf blade length (LBL), with 10.92%, plant height (PH), with 9.15%, number of fruits per plant (NFP), with 6.98% and fruit transversal axis (FTL), with 6.49%.These seven characteristics contributed with 82.51% of the total distribution (Table 4).Sudré, Rodrigues, Riva, Karasawa, and Amaral Júnior (2005), when evaluating the genetic divergence between 56 accessions of the Capsicum spp.germplasm collection of the State University of Norte Fluminense (UENF), using eleven quantitative descriptors, verified a different result from the one found in this work.They reported fruit length, stem diameter, number of seeds per fruit and fruit mean weight as the descriptors that most contributed to the genetic difference (32%, 32%, 13% and 12%, respectively).Therefore, it seems very relevant to carry out studies in specific collections, as well as to consider different environmental conditions at the time of the germplasm characterization and evaluation through morphoagronomic descriptors (Miguel, 2017).
The descriptors with the least contribution were: weight of five ripe fruits (RFW), with 0.48%; east-west fruits (EWF), with 1.95%; total soluble solids (TSS), with 4.14%; leaf blade width (LBW), with 5.28% and fruit longitudinal length (FLL), with 5.60% (Table 4).These characteristics resulted in only 17.45% relative importance, and, therefore were considered of little relevance in the characterization of the physalis accessions.Thus, these variables might be initially identified to be discarded, since according to E. Rego, M. Rego, Cruz, and Cecon (2003) those characteristics that contributed with a very low percentage or that did not contribute at all to the variability found should be discarded.As a criterion for this analysis, variables with relative contribution below 6% were discarded.These were five descriptors, that is, 41.66% of the descriptors were discarded using this method (Table 4).
The eigenvalues estimates associated to the principal components and their respective relative and accumulated variances obtained for the 12 quantitative characters are represented in Table 5.The first two principal components explained 89.49% of the accumulated total variation, while the relative variations and their respective percentages showed that most of the variation was concentrated up to the 4 th principal component, corresponding to 98.35% of the whole variation available in the germplasm collection.A percentage close to the one observed in this work, was reported by Moreira (2012), who analyzed the first two principal components and obtained 91.07%when characterizing lines of Capsicum annum L. using quantitative and qualitative descriptors.According to Cruz et al. (2004), for the study of diversity using principal components, the first couple of variables must retain most of the total variation, in general, over 80%.To Pereira et al. (1992), the variance distribution is associated to the nature and number of characteristics employed in the analysis, and it is only concentrated in the first couple of components when few characteristics of agricultural interest are evaluated or when they belong to specific parts of the plant.
According to the preliminary discharge through the direct method proposed by Jolliffe (1972Jolliffe ( , 1973)), 10 out of 12 characteristis (83.33%) that presented the highest weighting coefficients, in absolute value from the last principal component might be discarded, due to the number of components that presented eigenvalues below 0.7 (Table 6).
According to Hair Junior, Black, Babin, Anderson, and Tatham (2009), the Principal Components technique is advantageous, because it enables the evaluation of the importance of each of the characteristics investigated to the total variation between the accessions evaluated, allowing the elimination of less informative characters, since they are already correlated to the remaining variables or due to their invariance.The first descriptor indicated for discharge through the direct selection was the variable "east-west fruits", which presented the highest weighting coefficient in module with the last principal component (0.56) followed by the characteristics "entrenode length", "stem diameter" and "north-south fruits", whose highest eigenvectors in module occurred in the Principal Components 11, 10 and 9, respectively (Table 6).
Using this procedure, ten characters were found redundant, according to the discharge sequence: EWF, EL, SD, NSF, NFP, RFW, LBL, LBW, TSS and FTL.The procedure might be considered too strict, since it eliminated 10 of the 12 quantitative characters used as descriptors for physalis.Similar results were reported by Alvares (2011), who evaluated the genetic difference between 137 Capsicum chinense Jacq.accessions and verified that 14 out of the 25 morphoagronomic descriptors should be discarded based on the method proposed by por Jolliffe (1972Jolliffe ( , 1973)).
The information obtained using the two methodologies was in disagreement since in the procedure proposed by Jolliffe (1972Jolliffe ( , 1973) ) 10 characters were appointed to be discarded against only five in the Singh analysis.
Therefore, aiming at reducing inconsistencies in the elimination of descriptors, the adoption of two procedures is common to indicate the most relevant descriptors.For example, Afonso et al. (2014), andCastro et al. (2012) employed the same direct selection criteria proposed by Jolliffe (1972Jolliffe ( , 1973) ) and Singh coeficiente to select morphoagronomic descriptors for Manihot esculenta Crantz and yellow passion fruit, respectively.
Based on the simultaneous analyses by both procedures, four characters were coincidente in relation to the discharge: east-west fruit (EWF), weight of five ripe fruits (RFW), leaf blade width (LBW) and total soluble solidds (TSS) (Table 7).Although the descriptor total soluble solids (TSS) had been appointed for discharge by both the Singh (1981) and Jolliffe Methods (1972, 1973), it should be included in the group of descriptors selected for not presenting significant correlations with the other descriptors selected (Table 8).The use of this descriptor is highly relevant to the characterization of accessions in which the fruits are used in the agroindustry, since physalis fruit constitute a product that might obtain aggregated value.Therefore, the descriptors that were included in the final discharge were: east-west fruits (EWF), weight of five ripe fruits (RFW) and leaf blade width (LBW) (Table 7).The combination of information obtained from both methods indicated that 25% of the quantitative descriptors was eliminated, this percentage was higher than the 21% found for Euterpe oleracea Mart.(Oliveira, Ferreira, & Santos, 2006) and lower than the 40% found in Carica papaya L. (Oliveira, Dias, & Dantas, 2012).The correlations established were seen to be significant and positive for most of the descriptors investigated and demonstrated that the discharge of the appointed characteristics did not result in significant loss of information when they were not used in subsequent evaluations carried out in the physalis collection, since the characteristic indicated to be discharged (east-west fruits, weight of five ripe fruits and leaf blade width) were correlated to at least one of the selected descriptors (Table 8).
Similar phenomenon was observed by Oliveira et al. (2006), where four relevant characters (fruit weight, fruit widht, fruit mass per bunch and fruit yield) were discarded for the evaluation and selection of Euterpe oleracea Mart., with very little loss of information, since the discarded characteristics were strongly associated to the remaining characteristics investigated.
Positive and significant correlation was also found, however, with moderate magnitude between the characters weight of five ripe fruits and the fruit transversal length (r = 0.55*).The character leaf blade width was observed to present positive and highly significant correlation, with strong magnitude with plant height (r =0.80**), stem diameter (r = 0.83**), number of fruits per plant (r = 0.79**), leaf blade length (r = 0.89**) and moderate magnitude with north-south fruits (0.68**) and entrenode length (0.67**).The character leaf blade width also presented positive, significant and moderate correlation with fruit transversal length (r = 0.54*) (Table 8).Table 9 presents the qualitative descriptors, their phenotypical classes, percentage frequency of the accessions in each of the classes and the Renyi entropy level (Renyi, 1961).Any variable that presented a 0.00 entropy level was considered as discarded.
Table 9. Qualitative descriptors evaluated, phenotypical categories (classes), percentage frequency and entropy level of the P. angulata accessions under study.UEFS, Feira de Santana, 2018 The descriptors initially discarded for not being able to differ the accessions are presented in the entropy analysis with a 100% frequency, that is, those accessions were concentrated in the same class of the relevant descriptors, presenting entropy level equal zero, indicating that they were monomorphic for each characteristic under evaluation.The descriptors discarded were: Stem pubescence (SP), stem anthocyanin (SA), leaf blade shape (LBS), leaf apex shape (LAS), leaf base shape (LBS), leaf bundle pubescence (LBP), pubescence around the leaves (PAL), leaf vein anthocyanin (LVA), flower position (Pflow), corolla color (Ccor), color of the corolla stains (CCS), pedicel color (PC), ripe fruit color (RFC), calyx type (Tcal), calyx shape (Scal) and calyx division (Dcal) (Table 9).
Different results were found by Padilha, Sosinski, and Barbieri (2016), when evaluating genetic divergence and the entropy of descriptors in twenty-one pepper accessions from the Germplasm Active Bank at the Embrapa Clima Temperado (Temperate Climate Embrapa), verifying that the descriptors with the highest entropy values were flower position (1.48), stigma exacerbation (1.47), ripe fruit color (1.34), pungency (1.30), plant height (1.29), node anthocyanin (1.27), fruit shape in the pedicel (1.26), persistence between pedicel and fruits (1.24) and fruit shape (1.21); while, stem shape, leaf shape, length of placenta and seed color presented entropy values equal zero (0.00).These differences in entropy values of the descriptors might have been caused by the smaller number of accessions used in this study.Therefore, one may infer that although some characteristics present genetic variability, high entropy values were not found in relation to the number of accessions used in the research.
The following descriptors were selected: Growth habit (GH), stem color (SC), leaf margin shape (LMS), unripe calyx color (UCC), fruit shape (FS), unripe fruit color (UFC).Among the 22 qualitative descriptors proposed by the National University of Colombia (González, Torres, Cano, Arias, & Arboledo, 2008), with adjustments for the species (Table 1), only six descriptors were seen to be relevant to the discrimination of genetic divergence between the accessions studied (Table 9).
Further studies with a larger number of accessions and using different methodologies (Oliveira et al. 2012;Castro et al., 2012;Oliveira et al., 2014) to discharge qualitative descriptors should be carried out to confirm the discharge of the qualitative descriptors selected in this work.

4.Conclusion
The list of minimum descriptors to characterize P. angulata accessions might contain nine quantitative descriptors (plant height, stem diameter, north-south fruits, number of fruits per plant, leaf blade length, entrenode length, fruit longitudinal length, fruit transversal length and total soluble solids) and six qualitative descriptors (growth habit, stem color, leaf margin shape, unripe calyx color, fruit shape and unripe fruit color).
The results of this work led to the conclusion that discharging 55.88% of the descriptors does not result in significant loss of information, minimizes the costs and makes the characterization of P. angulata germplasm more dynamic.

Table 1 .
List of the accesses characterized with the respective denominations, origins, geographical coordinates and year of collection.UEFS.Feira de Santana, BA, 2018

Table 3 .
Descriptive statistics and Shapiro-Wilk normality test for the quantitative characters.UEFS, Feira de Santana, BA, 2018 Note.Plant height (PH) in cm; Stem diameter (SD) in mm; North-south fruits (NSF) in units; East-west fruits (EWF) in units; Weight of five ripe fruits (RFW) in kg/plant -1 ; Number of fruits per plant (NFP) in units; Leaf blade length (LBL) in cm; Leaf blade width (LBW) in cm; Entrenode length (EL) in cm; Fruit longitudinal length (FLL) in mm; Fruit transversal length (FTL) in mm; Total soluble solids (TSS) in ºBrix.** significant at 1%, * significant at 5% probability in the Shapiro-Wilks test.ns not significant.Normality test (W), Coefficient of variation (CV).

Table 5 .
Estimates of eigenvalues associated to the Principal Components and their total and accumulated variance, obtained from the twelve descriptors evaluated in the P. angulata six accessions under study.UEFS, Feira de Santana, BA, 2018

Table 6 .
Estimates of the weight coefficients associated to the Principal Components of eigenvectors below 0.70 and identification of the characteristics to be discarded in each component, through direct selection of P. angulata six accessions under evaluation.UEFS, Feira de Santana, BA, 2018 Note.Plant height (PH); Stem diameter (SD); North-south fruits (NSF); East-west fruits (EWF); Weight of five ripe fruits (RFW); Number of fruit per plant (NFP); Leaf blade length (LBL); Leaf blade width (LBW); Entrenode length (EL); Fruit Longitudinal length (FLL); Fruit transversal length (FTL); Total soluble solids (TSS).