Towards the Selection of Superior Sesame Lines Based on Genetic and Phenotypic Characterisation for Uganda

Understanding agricultural biodiversity is critical to formulate breeding strategies for crop improvement and it impacts both, conservation and collection activities. Especially germplasm collections serve as valuable resources, thus, their adequate characterisation is of utmost importance. Although Uganda ranks seventh in African sesame production, meagre research was conducted to determine the current genetic diversity among its germplasm. Therefore, in the present study part of the sesame germplasm conserved at the National Semi-Arid Resources Research Institute (NaSARRI) in Uganda focusing on 85 established lines was genetically and phenotypically characterised. Population genetic and structure analyses revealed rather a low extend of genetic diversity (expected heterozygosity [HE], or gene diversity [D]) ranging from 0 to 0.38 per entry, but a high extend of admixture within and between entries. This decrease of heterozygosity is supported by a fixation index (FST) of 0.530, indicating a medium genetic differentiation among entries. The analysis of quantitative and qualitative agromorphological traits revealed a great inter-trait variability among the entries and further indicated a certain conservation of some of the traits reflecting the geographic origin of the analysed entries. Based on both, the genetic and phenotypic characterisation, a selection of 26 superior entries is proposed, which may form a valuable basis both for farmers and breeders.


Introduction
Sesame (Sesamum indicum L.) is a primarily self-pollinated diploid with 2n = 26 chromosomes.It belongs to the Pedaliaceae (order Lamiales), a small family of 15 genera and 70 species characterised by annual and perennial growth forms.Sesame is an important and ancient crop cultivated in hot, dry climates for its oil and protein-rich seeds (Bedigian et al., 1986).Domesticated in the Indian subcontinent (Bedigian, 2003), currently, sesame is grown throughout the tropical and subtropical regions of the world with Sudan, China, India, and Myanmar being the top producers in 2014, together covering 46% of the world production (FAO, 2015).On the African continent, Uganda with an annual production of 124,300 tonnes ranks seventh in sesame production (FAO, 2015).Sesame, commonly known as simsim in Uganda, was introduced from Kenya in 1910 and since then has been distributed, cultivated, and used (Rubaihayo et al., 1997).Its adaptability to harsh climatic conditions including heat and drought makes it a favourable crop in north-eastern Uganda.Especially in the last decade, sesame has experienced a worldwide boom increase in its production to 158 per cent from 2004 to 2014 (FAO, 2015).Although sesame accounted for 83 per cent of total agricultural sales in 2014 in Uganda (Proctor, 2015), neither its production nor its productivity increased markedly since 2005 (FAO, 2015).environments.However, to be able to utilize the wealth of diversity in germplasm collections, their genetic and phenotypic characterisation is indispensable.Currently, genetic diversity is measured by using morphological, biochemical, and molecular markers, whereby the latter marker system became the most attractive one in recent years (Govindaraj et al., 2015).However, phenotypic characterisation is the first step in the classification and description of any germplasm.Several studies have exploited high genetic diversity in populations of sesame by analysing morphological traits only, thereby providing valuable information for cultivar selection to be used in different breeding programs (Arriel et al., 2007;Ansah et al., 2015;Falusi et al., 2015).Other studies were performed using a wide palette of molecular markers such as AFLP, ISSR, SSR and RAPD markers for germplasm diversity analysis and the construction of genetic maps (Laurentin et al., 2006;Sharma et al., 2009;Cho et al., 2011;Kumar et al., 2012;Alemu et al., 2013;Zhang et al., 2013;Dossa et al., 2016).Presently, combination of both morphological and molecular markers is increasingly becoming popular for analysing sesame diversity (Parsaeian et al., 2011;Pandey et al., 2015;Sehr et al., 2016).
Germplasm characterization not only produces valuable agronomic and breeding data, but it is also useful for the identification of duplicates within and between collections.Furthermore, when genetic resources are kept ex situ, seeds are frequently regenerated to keep their viability and to replenish seed stocks.During this process, certain extent of gene flow may occur as the result of cross-pollination, as well as through physical mixing of seed lots.As a result, the quality and integrity of the germplasm might get severely reduced.Thus, especially when handling cross-pollinating species, additional planning, care, and special techniques are needed in order to ensure the physical/reproductive isolation of accessions that is required to preserve their genetic identity.For sesame, contradictory outcrossing values have been reported ranging from less than 1 to nearly 70 per cent (Yermanos, 1980;Pathirana, 1994;Andrade et al., 2014) and still, sesame is mainly described as self-pollinated crop.Therefore, determining its regional outcrossing potential is of utmost importance not only for breeding, but also for conservation and collection activities and strategies.Despite this increasing number of studies characterising sesame germplasm collections, knowledge of the genetic diversity of entries assembled on the African continent at the molecular levels is scarce (Gebremichael et al., 2011;Alemu et al., 2013;Nyongesa et al., 2013;Woldesenbet et al., 2015;Sehr et al., 2016).Common findings were a high amount of genetic diversity within accessions, especially of local origin, and the occurrence of a certain extent of admixture between the accessions, which could probably be attributed to cross-pollination and local seed exchange among farmers.Hence, the two main objectives of the present study were i) to analyse and categorize existing variation in the 85 sesame germplasm entries assembled in Uganda, based on their phenotypic and SSR-related genotypic characteristics, and ii) to select superior lines as a valuable basis both for farmers and breeders.Both objectives intend to impact not only sesame breeding and conservation strategies, but, in the long run, also intend to improve sesame performance and usage for farmers.

Plant Material and DNA Extraction
A total number of 85 sesame entries were planted in the first rainy season (month of May) of 2010 in a randomized complete block design with three replications.These entries were comprised of germplasm accessions and breeding lines derived from genotypes and crosses of different countries of origin (China, Ethiopia, Kenya, Korea, Tanzania, Uganda, USA, and Zimbabwe) conserved at the National Semi-Arid Resources Research Institute (NaSARRI) in eastern Uganda (Table 1).Seeds stemming from selfed flowers of each entry were planted in a single-row plot of 2 m in length.Border rows were included at the beginning and the end of each replication to control border effects using the purple-coloured variety Sesim 2. Several flowers of five plants per entry were self-pollinated and two capsules per entry were randomly chosen and taken for further analyses.Seeds from the two capsules were germinated separately in Petri dishes.Eight seedlings from each capsule were picked for DNA extraction resulting in 1,360 samples (85 entries, á 2 capsules, á 8 seedlings).The extraction of genomic DNA was performed using the aerial parts of the seedlings following the protocol described by (van der Beek et al., 1992) with minor modifications for high-throughput handling using robotics.The extracted genomic DNA is deposited at the Repository Centre at the AIT Austrian Institute of Technology and is available upon request (Stierschneider et al., 2016).Detailed sample information is given in Appendix 1.
The resulting PCR products were diluted and mixed with Hi-Di Formamide and GeneScan 350/500 ROX dye Size Standard according to the manufacturers protocols (Life Technologies).The size of the fragments was resolved based on capillary electrophoresis using the ABI 3110 XL Genetic Analyzer.Allele calling was performed using GeneMapper® Software 5 (Applied Biosystems).Non-amplified loci were scored as missing data.

Genetic Data Analysis
To avoid allele frequencies bias due to full/half sibship and to be able to infer population genetic structure over the entire germplasm collection, clonality within the dataset was determined in silico by measuring the number of 100 per cent multilocus matches.Repeated matching multilocus genotypes were removed from the data set for subsequent analysis.Genetic variation was investigated on the entire dataset as well as on the reduced dataset using standard genetic diversity estimates per locus and entry including expected heterozygosity (H E ; or gene diversity [D]), observed heterozygosity (H E ), inbreeding coefficients (F, F ST , F IS ), gene flow (Nm), and an analysis of molecular variance (AMOVA) among and within the countries of origin with 999 permutations was performed.All computations were done using GenAlEx v. 6. 502 (Peakall et al., 2012).Population structure of the reduced set without repeated matching multilocus genotypes was examined using the Bayesian model-based approach implemented in Structure 2.3.4 (Pritchard et al., 2000;Anderson et al., 2008).The number of subgroups (K) evaluated ranged from 1-30.The analysis was performed using five replicate runs per K value, a burn-in period length of 10,000, and a run length of 50,000.The no admixture model was used to determine the correlated cluster.The R package pophelper (Francis, 2016) was used to determine the final K value based on the delta K algorithm (Evanno et al., 2005).Based on the Nei pairwise genetic distance matrix of the entire dataset, a neighbor-joining (NJ) tree using MEGA 6 (Tamura et al., 2013) was created to visualize genetic diversity and relationships among the genotypes.

Phenotyping and Trait Statistical Analysis
Seeds from the remaining selfed flowers from the plants used for genotyping that formed capsules were planted in the first rainy season (Mid-April) in 2011 and were phenotyped during the second season of 2011 (September) for evaluating agromorphological diversity on the total set of the germplasm (85 entries) at NaSARRI, Uganda.
According to the official descriptors for sesame (IPGRI et al., 2004), the following 10 traits were measured in a quantitative approach and were used for further diversity analysis: days to flowering (DTF), days to maturity (DTM), plant height (PH [cm]), plant height to first capsule (HFC [cm]), plant height to first branch (HFB [cm]), number of branches (NB), length of capsule zone (LCZ [cm]), number of capsules on main stem (NCMS), number of capsules on branches (NCB), and total number of capsules (TNC).The mean values across three replicates are shown in Appendix 2. Entry number 025, however, was measured only once per trait and exhibits very extreme numbers in comparison to all other lines.Since there was no validation of the traits through replicate measurements, this line was thus excluded from further statistical analysis.For an examination of the overall phenotypic diversity across all 10 traits, box plots were generated for each trait using R (R Core Team, 2015).Furthermore, individual trait values were standardized using a z-transformation for equal mean and standard deviation.Standardized data was then subjected to principal component analysis and principal component scores were determined for each line after applying the varimax rotation procedure using SPSS statistical software (PASW Statistics 18, IBM Corp., Armonk, NY, USA).In addition, 14 qualitative traits were measured per entry on the basis of the official descriptors for sesame (Table 2, Appendix 3).

Outcrossing Test
In order to assess the rate of outcrossing present under natural conditions, a sibship analysis was performed.For this, a single farmer's field was chosen where Sesim 2 was cultivated already for some generations and where off-type individuals used to appear.Capsules of individual plants were collected randomly on an area of approx.100 m².Seeds from each capsule were germinated and DNA was extracted from the seedlings as well as from the capsule tissue reflecting the maternal genetics.This way genomic DNA of seven different mother plants (A-G) and 128 seedlings were investigated (Table 3).Each plant was represented by one sampled capsule.From plants A, C, D, and E eight seedlings per capsule and from plants B, F and G 32 seedlings per capsule were analysed.SSR analysis was performed as described above.

Entry Selection
In order to assemble a selection of good-performing entries, the entries were chosen based on their qualitative traits, overall hairiness, and high genetic diversity.For each qualitative treat (n = 10) the mean and the standard deviation was calculated.Only those entries having at least five traits above the single positive standard deviation value were chosen for the selection.The qualitative values for hairiness of stem, corolla and capsule were summed up, the mean and the standard deviation was calculated.The entries with values above the single positive standard deviation value were added to the selection.The same modus operandi was applied to genetic diversity.The entries with H E values above the single positive standard deviation value were considered for the selection.

Genetic Diversity and Germplasm Structure
Obtaining unbiased estimates of genetic diversity is particularly critical for management and conservation of species.It has been shown that when full siblings were sampled, the estimates of population genetic parameters were affected, also depending on the software tools used (Anderson et al., 2008;Goldberg et al., 2010;Peterman et al., 2016).Thus, in order to be able to infer population genetic structure of the entire germplasm collection by ruling out a possible bias due to consanguinity, a reduced dataset was created by removing repeated matching multilocus genotypes, resulting in 666 remaining samples (Appendix 1).
Heterozygosity and polymorphism were calculated based on the reduced dataset for each locus (Appendix 2) and for each entry (Appendix 3) separately.Per locus, the number of alleles ranged from 2-15, the H O values were very low (0.00-0.31),H E values were in the range of 0.03-0.83.Per entry, the calculated mean H E values (or gene diversity, D) varied from 0 to 0.378 (grand mean = 0.219), and the H O values ranged from 0 to 0.489 (grand mean = 0.137), whereby the following entries showed no gene diversity at all (H E and H O = 0): 002 (Sesim 2, Uganda), 036 (China), 067 (USA), 078 and 079 (Kenya), and 118 (Local Sesim 2, Uganda).This is in line with previous studies, where a gene diversity (H E , D) between 0 and 0.440 is described in African sesame lines (Gebremichael et al., 2011;Nyongesa et al., 2013;Sehr et al., 2016).Besides the fact, that in comparison to intronic SSRs, exonic SSRs contain less allelic variability because they are subjected to stronger selection pressure due to their functional significance (Li et al., 2004), low H E values can further be explained by genetic isolation, historical population bottlenecks, founder effects, inbreeding or selection processes.In the case of the herein analysed germplasm sample subset, the latter effects, inbreeding and selection in breeding processes, might have played a major role in declining heterozygosity, which is further reflected by an inbreeding coefficient (F IS ) of 0.329 and a fixation index (F ST ) of 0.530, indicating a high extent of homozygote individuals and a medium genetic differentiation among entries, respectively.This is in line with the general knowledge, that the higher the extent of domestication of a given crop is, the narrower is the range of its genetic diversity (Tanksley et al., 1997;Flint-Garcia, 2013).The relative measure of migration between the entries (Nm) was 0.229, which falls in the range of previously described gene flow values of self-pollinated plant species (Govindaraju, 1989).However, gene flow is also described to occur to a certain extent in germplasm collections (de Vicente, 2005).Whereby it is unclear whether the Nm values of our dataset reflect recent gene flow levels (e.g.due to cross-pollination or local seed exchange among farmers) or are caused by the fixation of alleles during the breeding processes in evolutionary time.
After grouping the entries according to their country of origin (n = 8, cf.Table 1), the degree of genetic diversity (H E , D) within a specific country of origin, ranged from a low value 0.18 up to 0.48 (Table 4).The mean number of alleles ranged from 1.44-4.78,whereby the highest allelic richness was seen in the entries stemming from Kenya and USA.Entries from Korea, Tanzania and Zimbabwe showed lesser number of alleles, which might be due to fact that these countries of origin comprise only one entry each.An AMOVA analysis was used to evaluate the diversity components within and between the individuals, which have been grouped into the respective countries of origin.The majority of the variance occurring among the individuals accounted for 57 per cent of the total variation, and seven per cent of the variation was attributed to differences among the countries of origin (Appendix 6).Similar results were also described, where differences among geographical regions were represented only by five per cent of the total variation in sesame lines (Laurentin et al., 2006;Woldesenbet et al., 2015).In order to resolve the relationships among the entries, a NJ tree based on pairwise population matrix of Nei unbiased genetic distance values was generated (Figure 1).The entries present in the Ugandan germplasm collection showed very little to no relationship with respect to their country of origin.Five entries from Kenya maintained their genetic identity and relationship, but the remaining entries were well intermixed.The Sesim 2-related local selection number 188 and the entry number 016 showed identical marker alleles, in contrast to the related entries number 098, 114, and 191, which were highly divergent from their supposed ancestor, Sesim 2.

Proposed Selection of Well-Performing Entries
The generated genetic and phenotypic datasets will serve as a valuable knowledge base for the selection of superior genetic material.In order to assemble a selection, the entries were chosen as described above based on their quantitative traits, their overall hairiness, and their genetic diversity.
Taking all 10 quantitative traits into consideration, the best performing entries are 008, 010, and 017 originating from Uganda, 004 from Ethiopia, 035 from Tanzania, 087 from Zimbabwe, and 044 from China (characterised by at least five traits above the single positive standard deviation; marked in Figure 1 with an asterisk (Appendix 2).Based on the same scheme, the least performing entries (characterised with at least five traits below the single negative standard deviation) are coming from China (040, 042, and 050) and USA (057, 067, 068, and 070).The 13 most hairy entries with sums above the positive standard deviation value (sum > 6.5) are composed of six entries from the USA, four from Uganda, two from China, and one from Ethiopia (marked in Figure 1 with a triangle).The genetically most diverse entries representing H E values above the positive standard deviation value (H E ≥ 0.31) are 010 and 012 originating from Uganda; 043 and 055 from China; 058 from USA; 073, 077, 082 and 085 from Kenya, and 087 from Zimbabwe (marked in Figure 1 with an upward arrow).The entries 020 (Uganda), 055 (China), 058 and 060 (both USA), are characterised by both, hairiness and high genetic diversity, whereas the entries 010 (Uganda) and 087 (Zimbabwe) are characterised by the combination of good quantitative trait performance and high genetic diversity.The combination of hairiness and good quantitative trait performance is given in the entries 004 (Ethiopia) and 008 (Uganda).Summarized, a core selection composed of Note.*Values above the positive standard deviation (H E > 0.31; hairiness sum > 6.5). 1 Only traits with values above the positive standard deviation are shown.Number of branches (NB), number of capsules on main stem (NCMS), number of capsules on branches (NCB), and total number of capsules (TNC), days to flowering (DTF), days to maturity (DTM), plant height (PH), plant height until first capsule (HFC), plant height until first branch (HFB), and length of capsule zone (LCZ).+ Entry number 025: only one measurement has been done per trait, thus, the values of this entry should be taken with care.

Conclusion
Presence of genetic variability in crops is essential for its further improvement by providing opportunities for the breeders to develop new varieties and hybrids.Existing variation in part of the sesame germplasm conserved at NaSARRI in Uganda comprising 85 lines stemming from eight countries of origin was categorized through phenotypic (quantitative and qualitative) and genetic characterization.Despite a rather low genetic diversity (H E grand mean = 0.219), we detected a strong admixture within and between the entries, which could be the result of the concerted action of several causes such as a differing ancestry (most likely due to the breeding process itself, but also due to cross-pollination) or due to material exchange between locations.Thus, if the maintenance of the genetic integrity of germplasm is attempted, causes of gene flow must be prevented where possible.On the basis of the phenotypic and genetic characterisation, we defined a core selection of 26 superior entries characterised by high genetic diversity, hairiness, and overall good performance of quantitative agromorphological traits.These entries form a valuable repertoire of the sesame germplasm to be used by breeders and farmers in Uganda.
Figure 1. the analy Ethiopia represen best p deviatio n of entries wi alysis using th erson et al., 20 tarted to flatten whereby at K = the number o oise) was detec fied which can ucture plots of ry of origin.Th = 18, Ethiopia tic structure (K try number 034 types seem to hich might be on of novel acc on structure pl Agromorpholog es were charac ct of interest f tries of the Ug ority of germpl %), deep purple um number of of branches wa and upper ease yield by i more than one ht purple stem 3%).Eighty-th and 17 per cen part of the ste e lower part.

Table 1 .
List of countries of origin and the corresponding sesame entries

Table 3 .
Samples from farmer's field analysed for outcrossing by sibship analysis

Table 4 .
Mean values of population genetic parameters per country of origin: number of individuals (N), different alleles per locus (N A ), number of effective alleles per locus (N E ), expected and observed heterozygosity (H E and H O ), and the fixation index (F)

Table 6 .
Outcrossing values by testing 128 individuals from seven mother plants at the nine microsatellite loci

Table 7 .
Proposed selection of 26 entries Note.Sample size (N), number of alleles (N A ), number of effective alleles (N E ), observed heterozygosity (H E ), expected heterozygosity (H E ), and the fixation index (F) calculated as mean values using GenAlEx.