Codon Usage Bias of the Wheat Flower Development Gene WAG-2 and Other AGAMOUS Group Genes

Analyzing codon usage bias of WAG-2 gene in wheat three-pistil (TP) mutant may provide a basis for selecting the appropriate host expression systems to improve the expression of target genes. In the present study, we analyzed the codon bias of the complete coding sequence (CDS) of the WAG-2 gene in TP using Codon W program, and compared the results with AGAMOUS (AG) group genes of other plant species. Results showed that the WAG-2 gene in TP and other monocot AG group genes preferably used codons ending with G/C bases, but Arabidopsis thaliana, Nicotiana tabacum, and other dicot crops were biased toward the synonymous codons with A/T. The clustering results based on codon bias were consistent with those based on CDS of the AG group genes, indicating that the difference in codon preference of AG group genes sequences was closely associated with the genetic relationship of the species. The Euclidean distance coefficients of WAG-2 with A. thaliana and N. tabacum were 9.255 and 5.730, respectively, indicating that N. tabacum may be more suitable for the expression of WAG-2. There were 37 codons showing distinct usage differences between WAG-2 and genome of yeast, 23 between WAG-2 and Escherichia coli. Therefore, the E. coli was the superior protein expression system. These results may improve our understanding of codon usage bias and functional studies of WAG-2.


Introduction
Common wheat (Triticum aestivum L.) line three-pistil (TP) mutation is a novel mutant of flower development, which was selected by Peng (2003) from the "tri-grain" wheat cultivar.The TP mutant stably carries three pistils in a floret and shows three normal stamens, making it a valuable material in wheat flower development.
The wheat AGAMOUS (AG) ortholog WAG-2 is a C-class MADS-box gene.AG group genes play a central role in floral organ differentiation, formation, and development.Studies on genetic structure, expression and function of AG group genes have a potential application in seed plant breeding.Previous results using real-time PCR and in-situ hybridization have revealed that WAG-2 might be associated with the development of pistil, ovule, and stamen homeotic transformation into pistil like structures (Mizumoto et al., 2009).However, the specific function of the WAG-2 gene in floral organs, especially in the development of the pistil and ovule, remains unknown.
A total of 61 nucleotide codons are used to encode 20 amino acids and three codons to terminate translation.All amino acids are encoded by two to six synonymous codons with the exception of Met and Trp.Codon bias refers to the nonrandom usage of synonymous codons for encoding amino acids in organisms.After long term evolution, species form a set of specific codons to survive.The phenomenon of synonymous codon usage bias is widely observed in various species and genomes, even among different genes of the same genome (Ingvarsson, 2008;Liu, 2010).During the last few years, the synonymous codons usage in bacteria, yeast, and higher eukaryotes has been extensively analyzed.Brinkmann, the first to systematically analyze codon bias of monocots and dicot genes, has revealed that the codon usage between the chloroplast Gap 2 genes of maize and dicots is largely differences (Brinkmann et al., 1987).Study of 207 plant gene sequences confirms that codon usage in Codon usage bias within genes in a single species appears related to the level of expression of the protein encoded by that gene.Codon bias is most extreme in highly expressed proteins of E.coli and yeast.Hoekema (1987) reported that replacement of preferred codon by minor codons in the 5' end of the highly expressed yeast gene PGK1 resulted in a decreased level of both protein and mRNA.The bias codon choice in highly expressed genes enhances translation and is required for maintaining mRNA stability in yeast.The degree of codon bias may be a factor to consider when engineering high expression of heterologous genes in yeast and other system.If the exogenous genes have too many rare codons of expression systems and the preference differences between the exogenous gene and the expression system are significant, the transcription and translation levels of the exogenous gene in the host would be decreased.
In the present study, the codon usage bias of the WAG-2 gene in TP was evaluated and compared with that of other AG group genes.Our findings provide a basis for selecting appropriate receptor plants and protein expression systems in WAG-2 gene functional studies.

Codon Usage Bias Analysis
Effective number of codon (ENC) can be used as a simple measure of codon bias in a gene, and is the best estimator of absolute synonymous codon usage bias.ENC value is between 20 (when only a single codon is used for each kind of amino acid, which means extreme preference) and 61 (when all available codons are used, signifying no bias).Low ENC value indicates a strong codon usage bias (Wright, 1990).The sequences with ENC values < 30 are highly expressed genes, whereas those with ENC values > 55 are poorly expressed genes (Biro, 2008).In addition, the GC and GC3 content of WAG-2 and other AG group genes were calculated using Codon W 1.4 programs (http://codonw.sourceforge.net).
To investigate the characteristics of synonymous codon usage of different amino acid compositions, we calculated the relative synonymous codon usage (RSCU) values of 59 informative codons (excluding Met, Trp, and the three termination codons) in each CDS of AG group genes according to Codon W 1.4.The RSCU value was calculated by dividing the observed codon usage by the expected value when all codons for the same amino acid are used equally (Yang et al., 2010).If all synonyms for that amino acid are used equally, the RSCU values are close to 1.0, indicating a lack of bias.When the RSCU of a codon is more than 1.0, the codon has high-frequency usage (Sau et al., 2006).
Codon usage frequency is a measure of codon usage differences between species.Ratios between 0.5 and 2.0 show that the biases of the two codons are relatively close.Ratios equal to or less than 0.5 and equal to or greater than 2.0 indicate that the codon usages are different.Choosing a host is important in transgenic research, and choosing the appropriate codons is one of the most vital factors that affect expression in hosts.If foreign genes contain numerous rare codons that are incompatible with the expression system of the host, the result is extremely low expression quantity or termination of translation, especially when rare codons are distributed continuously.

Statistical Analysis
The comparison between RSCU value and 1.0 was performed with one-sample t-test.The differences of ENC, GC, GC3 between monocot and dicot plants were calculated using Kruskal-Wallis test.

Clustering Analysis
First, we used the Euclidean square distance to conduct clustering analysis based on RSCU values of the WAG-2 and other AG group genes after data standardization.The formula used to calculate the Euclidean distance coefficient (Dab) of codon usage bias between two genes a and b is as follows: (1)

Synonymous Codon Usage of WAG-2 Genes
The ENC, GC, and GC3 content of AG group genes were shown in Table 2.The synonymous codon usage of four WAG-2 genes in TP was remarkably close.The ENC values of  were significantly lower than 55 (41.54, 42.22, 41.71, and 41.78, respectively), suggesting that WAG-2 genes in TP had moderate preference in codon usage and the expression levels were general.The GC content of WAG-2l, WAG-2m, WAG-2n, and WAG-2o were 0.559, 0.554, 0.559, and 0.562, respectively, with a mean value of 0.558.The GC3 content were significantly greater than 0.5 (average of 0.764), indicating that WAG-2 genes preferably used G-ended or C-ended codons.
The RSCU values of 59 codons in four WAG-2 genes were analyzed.Given that the codon usage of the four WAG-2 genes was highly consistent, only the average RSCU value was listed in Table 3.In the WAG-2l, WAG-2m, WAG-2n, and WAG-2o genes, there were 23, 23, 24, and 24 codons, respectively, whose RSCU values were significantly higher than 1.0 (p < 0.01).Therefore they are the optimal codons of the WAG-2 genes.In addition to the GAT and CAT, the rest of the 21 codons preferably used G-or C-based endings.The faction and frequency values of these codons were higher, which verified that the WAG-2 genes preferred the G or C ends.

Composition with Other AG Group Genes on Codon Usage Bias
genes in these species were general.In dicots, only one AG group gene CAG1 (C.sativus) had lower ENC value (47.35) than the other species.The ENC values in other species were relatively higher than 50 with a mean of 56.46, suggesting no obvious preference in codon usage and that the expression levels of AG group genes in these species were poor.
Interestingly, the contents of GC3 (average of 0.7165) in monocots were significantly higher than those in dicots (0.4649).GC contents in monocots (> 0.5) were different from those in dicots (< 0.5).In monocots, the average percentage of G+C at the third position of the codon was the highest, reaching a maximum of 94.30%.However, the contents of G+C and A+T in dicots were 69.46% and 62.25%, respectively.
Our results from the Kruskal-Wallis test showed that the ENC value were significantly different between monocot and dicot plants (p < 0.05), exception for WAG-1 and OsMADS58.The GC and GC3 content in monocot was significantly higher compared with dicot (p < 0.05).These results showed a significant difference in G or C preference and in ENC, GC, and GC3 contents in both monocot and dicot plants.AG group genes preferred C-ended or G-ended codons at the synonymous positions in monocots, whereas ending with A or T in dicots.Note.ENC, effective number of codon; GC, G+C content; GC3, G+C content in the 3rd codon position; A3, A content in the 3rd codon position; C3, C content in the 3rd codon position; G3, G content in the 3rd codon position; T3, T content in the 3rd codon position.

Comparison with Genomes of E. coli and Yeast on Codon Usage Frequency
The differences in WAG-2 codon usage in various hosts affect expression levels.Thus, codon preference must be considered when genes are expressed in heterologous hosts.The usage frequencies of 64 codons in WAG-2 were compared with those in E. coli and yeast (Table 4).We found that the number of codons with ratios > 2.0 and/or < 0.5 were 23 in E. coli and 37 in yeast.This result suggested that the E. coli expression system may be superior to the yeast expression system for WAG-2.

Clustering Analysis
Two types of common model plants, A. thaliana and N. tabacum, are widely used in the study of plant gene expression and function.To discover whether WAG-2 can be expressed efficiently in the two model plants, we conducted clustering analysis based on codon bias using the squared Euclidean distance method (Figure 1).The Euclidean distance coefficients of the WAG-2 gene with A. thaliana and N. tabacum were 9.255 and 5.730, respectively, which indicated that the codon usage bias between the WAG-2 and NAG genes was more similar (Appendix A).Thus the N. tabacum may be more suitable for the heterogeneous expression system of the WAG-2 gene.In addition, the resultant cluster was clearly classified in monocot and dicot clades according to their codon usage bias.T. aestivum, H. vulgare, B. distachyon, O.sativa, Z. mays, and S. bicolor were included in the monocot clade.The dicot clade consisted of A. thaliana, A. majus, and P. hybrid.Monocot genes were subdivided into three clades.WAG-2, HvAG1, and MADS3 were clustered in one subclade.ZMM2, AG (S. bicolor), and OsMADS3 were attributed to another subclade.OsMADS58, ZAG1, WAG-1, and HvAG2 were included in the third subclade.The phylogenetic tree based on CDS indicated that the AG group was also classified into monocot and dicot clades (Figure 2).Both phylogenetic trees were highly similar to each other and differed only in the positions of CAG1 and CAG2 in the dicot clade.

Discuss
Codon usa or G verse GC conten between g strikingly genomes c known tha (Guo et  Genes with different functions also possessed distinct codon usage patterns in plants (Rota-Stabelli et al., 2013).
High ENC values and low ENC genes belonging to the same gene family share different functions (Liu et al., 2015).In the present study, the ENC value of the WAG-2 gene was lower (with a mean of 41.78) than that of the WAG-1 gene (53.36).Previous studies indicated that AG group C-function gene was grouped into two AG orthologs, where wheat WAG-1 and WAG-2 belonged to AG1 and AG2 orthologs, respectively.Minimal WAG-2 genes were found in transformed and developing stamens at the floral organ development stage, but were abundant in the marginal region of the developing ovule and in the central region of pistils (Mizumoto et al., 2009).Nevertheless, the WAG-1 gene was associated with the development of pistil and stamen, and with pistillody caused by nuclear-cytoplasm interactions in alloplasmic wheat (Meguro et al., 2013).Wheat AG orthologs WAG-1 and WAG-2 exhibited functional differentiation during floral organ development.This phenomenon showed that the codon usage bias pattern of wheat WAG-1 and WAG-2 genes was somehow linked to gene function, which requires further investigation.
Understanding the codon usage bias can show the codon usage pattern of species, and provide evidence about the evolution of organisms.Higher plants are like other organisms in that each species has a unique codon bias with plants of the same taxonomic class maintaining a similar codon usage pattern (Campbell & Gowri, 1990).Species with near genetic relationship share the near codon usage frequency and preference.In the present study, codon usage of AG group genes in 10 monocot and 10 dicots were analyzed.It was found that the relationships of species were more closely, and the codon usage patterns of AG group genes were more similar.The genetic relationships between monocots were close, and their AG group genes on codon usage were also similar, so did dicots.The clustering results based on codon usage bias were consistent with those based on the CDS of the AG group genes (Figures 1 and 2).These results indicated that the difference in codon preference of AG group genes was closely associated with the genetic relationship of the species.So the analysis of codons usage bias was an important and supplementary method to phylogenetic research, and was used to the investigation of the evolutionary relationships of species.
In dicots, A. thaliana and N. tabacum were two types of general receptor plant in the study of gene expression and function.Efficient expression of exogenous genes in E. coli or yeast will lay a foundation for the identification of gene function.The codons usage bias in highly expressed genes enhances translation and is required for maintaining mRNA stability in yeast.The degree of codon bias may be a factor to consider when engineering high expression of heterologous genes in yeast and other system.Species within the same taxonomic class exhibit a similar codon usage pattern.And species with near genetic relationship share the near codon usage frequency and preference.In the present study, the codon usage bias between the WAG-2 and N. Tabacum NAG genes was more similar.Consequently, we concluded that N. tabacum was the superior heterologous expression systems, which required further study.Compared with yeast and E.coli genome, WAG-2 gene showed difference of 37 and 23, respectively, indicating E.coli was the superior protein expression system.If the WAG-2 gene showed a high expression level in yeast, some modifications on different partial codons would be required.

Conclusion
In summary, the codon usage patterns and phylogenetic information provided in this study may help in determining the appropriate expression system of exogenous and in investigating the function of WAG-2 genes in TP.

Table 1 .
The WAG-2 gene and other AG group genes

Table 2
showed the codon usage bias of AG group genes in monocots and dicots.In monocots, the ENC values of WAG-1 (T.aestivum), OsMADS58 (O.sativa) were slightly lower than55 (53.36 and 53.80, respectively), suggesting no obvious preference in codon usage.The ENC values in other monocots varied from 42.41 to 49.79, with a mean value of 46.21, indicating that these AG group genes were moderately biased.The ENC values of AG group genes in all monocot species were lower than 55, suggesting that the expression levels of AG group

Table 2 .
The ENC values and contents of GC for WAG-2 genes and other AG group genes

Table 3 .
The RSCU value of WAG-2 gene Note. *Termination codon.The data with underline mean that RSCU > 1.0.