Investigating Grammatical Colloquial Features in EFL Learners ’ Theses by Chinese English Learners

Researches into colloquialisation in academic writing have become increasingly popular in recent years. However, little has been conducted to the dimension of grammar. Thus, through the corpus-based quantitative and qualitative analysis method, the present study compiled three corpora extracted from Chinese MA theses, PhD dissertations and international journals, aiming to explore the grammatical colloquial features and non-colloquial features in Chinese EFL learners’ theses. Compared with international journals, both MA theses and PhD dissertations displayed strong colloquial tendency. The similarities between MA theses and PhD dissertations outweigh their differences. Besides, doctoral dissertations are not less colloquial than MA theses. The statistical evidence suggests that the EFL learners in China lack the register consciousness of academic writing and fail to comply with the conventional pragmatic paradigm of academic discourse. With the intention to deepen EFL learners’ stylistic awareness and decrease their colloquial tendency, the study offers some suggestions, seeking for the pedagogical implications for English academic writing.


Background of the Study
Colloquialisation, the increasing acceptance of colloquial features especially in more formal genres, has been a great grammatical change in English since the mid-twentieth century (Collins, 2013).Collins & Yao (2013) focused on the grammatical colloquialisation across a range of registers in ten world Englishes including British English, American English, Australian English and so forth.The findings indicated that the ten Englishes investigated displayed different levels of colloquial tendency.This suggests that couples of colloquial expressions have been accepted by formal registers, which provided some practical and indicative information for the following studies.
Academic writing, as the most formal form among any piece of writing, has its typical paradigm.One same genre shared identical standardization in mode and same prescriptivism in convention, which stipulated that the language users must observe the conventional paradigm to conduct verbal communication activities.Academic discourses, categorized as written language, are featured by the particular paradigm of verifiability, predictability and reproducibility.The expressions in academic writing are required to be formal, standard and non-colloquial.
The researches on colloquialisation in China mainly concentrated on the colloquial features of EFL (English as a Foreign Language) learners' written language, among which the colloquial features of lexical studies had the most achievements.However, few empirical researches have been conducted on the grammatical colloquialisation of academic writing.In the present study, the grammatical colloquialisation is defined as "the overuse of several grammatical items which are used significantly more in spoken language than academic language".

Purpose and Significance of the Study
Studies of recent diachronic change in British and American English (e.g., Leech et al., 2009) suggested that colloquialisation had played a role in the rising popularity of several grammatical categories (Collins, 2013b).Nevertheless, such developments have rarely been investigated beyond British and American English, not alone Chinese EFL learners' academic writing.Thus, the gap has prompted the present exploration of the impact of colloquialisation on Chinese EFL learners' academic writing represented by MA theses and PhD dissertations.
According to Pan (2012), the authors must present their studies in the way the international scholars accept so that their theses can be acknowledged and adopted by the international academic circle.However, inappropriate use of grammatical items fails to meet the requirement of fluency, idiomaticity and readability of academic writing.Through the comparison of the frequencies and ratios of three spoken grammatical categories between Chinese EFL learners and international scholars, this study has a comprehensive analysis on grammatical colloquial features of Chinese learners.Based on the empirical evidence, some pedagogical implications regarding material design and classroom teaching are provided for the improvement of English academic writing.

Previous Studies on Colloquialisation
Since Biber (1988), the issue of variations within a language has caught the eye of many scholars both.Biber (1988) set up the multi-dimensional/multi-feature (MD/MF) model and identified six dimensions that distinguished a language variation from spoken to written.
The following researches mainly focused on the lexical variation from spoken genre to written genre.Leech, Rayson, & Wilson (2001), for example, revealed the lexical differences between spoken and written language based on frequencies for the first time.However, in recent years, increasing attention was attached to the grammatical dimension of spoken and written language.Collins (2004) reported the findings of a corpus-based study of let-imperatives in English.The results showed that unlike ordinary imperative clauses with the lexical verb let meaning "allow" semantically and syntactically, those with the special grammaticalised let: first person inclusive let-imperatives and open let-imperatives have been bleached of its propositional content and serves merely to mark illocutionary meaning in Modern English.Collins (2007) provided the first comprehensive corpus-based semantic description of can, may, could and might in three parallel corpora of contemporary British, American and Australian English.The statistical results reflected the respective trends of these three models as three markers of epistemic possibility.Besides, the study cast into light that even the modals investigated can express the same meanings, paradoxically there is little semantic overlap between them.Collins (2009) reported evidence from an analysis of the frequency and distribution of a set of modals and quasi-modals in nine matching corpora including British, American, Australian, New Zealand, Philippines, Singapore, Hong Kong, Indian and Kenyan English.The evidence showed that quasi-modals flourished in speech and their modal counterparts more in written registers.In the nine Englishes analyzed, American English played the leading role in the rise of most quasi-modals and the decline of modals.Collins & Yao (2013) did a corpus-based research on three grammatical categories known to be undergoing a colloquialism-related rise in contemporary English, across a range of registers in ten World Englishes.The findings showed several strong tendencies of each grammatical category that differed from genre to genre.The results also yielded clear patterns of regional differentiation that colloquialism is a stronger driver of grammatical change in the IC (inner circle, such as American English, British English, Australian English and so on) than the OC (outer circle, such as Hong Kong English, Kenyan English and so forth).Collins (2013) conducted a diachronic study based on a set of parallel contemporary corpora to explore the impact of colloquialisation on couples of grammatical features (quasi-modals, get-passives, first person plural imperatives, there-existentials, and progressives) across a range of World Englishes.The results indicated that colloquialisation may be the major cause of the rise of the grammatical categories investigated.
Researches on GET construction of Modern Irish and Irish English were explored from a synchronic perspective by Nolan (2008).The study discovered that the GET construction of Irish English and the GET construction of Modern Irish reflected a unique cognitive perspective on bilingual lexicon architecture and the role of constructions.And the functional account of both language constructions that mediate the relationship between lexicon and construction best explained the relationship between these constructions across the two languages.
The researches in China are from two main perspectives: written and oral output of Chinese EFL learners.Chinese researches to colloquialisation mainly concentrated on the lexical colloquial features of EFL learners' written language.Ma (2002) made a contrastive analysis of the linguistics features in the compositions written by Chinese college EFL learners and native learners in American college.The findings suggested that Chinese learners used significant more second person pronouns, discourse articles, conjunctions and adjectives than native speakers while American learners resorted to because-adverbial clauses, that-attributive clause, that-object clause and persuasive verbs.In general, according to Ma, the compositions of Chinese EFL learners were featured by informativity and formality while the compositions of American learners were more colloquial and had more individualisation characteristics.Wen et al. (2003) compared the writer/reader visibility between Chinese EFL learners and native speakers.They pointed out that one of the linguistics features in advanced level English learners' interlanguage was the colloquial tendency in their written language.And the gap between the Chinese EFL learners and the native speakers was observable.What's more, the findings indicated that the colloquial tendency in the compositions of English major students declined with the development of learners' L2 proficiency and the colloquial tendency in different grades was distinct among which the differences between the freshmen and the sophomore-year-students.Sun (2012) made a comparative study by building six corpora consisting of MA theses, PhD dissertations and journal papers of both Chinese EFL learners and international scholars.The study drew a conclusion that Chinese EFL learners tended to overuse the colloquial lexical items which are more typical in speech than in academic writings and had the inclination to avoid expressing personal opinions.Besides, the findings confirmed the conclusion drawn by Wen et al. (2003) that the colloquial features in the academic writings of Chinese EFL learners and international scholars declined significantly with the development of writers' L2 proficiency.Dai (2011) conducted a data-based investigation of the writings written by Chinese English major students at different levels with the intention to identify the colloquialisation manifested on lexical, syntactic, discourse level.The answer to the question whether there are colloquial features in Chinese EFL learners' writing complied with that of Wen & Sun.Besides, she discovered that colloquailisation in morphological and syntactic level is the most salient and this phenomena was closely linked to the present English teaching situation in China.
The researchers studied the written language of Chinese EFL learners.Several investigations were also conducted on the oral output of them.Ma (2004) collected more than 300 spoken texts to make an analysis of the spoken English of native speakers.The results indicated that spoken English had much repetition and short sentences.The simple present tense and simple past tense were highly adopted person pronouns, so were pronouns and abbreviations.
By applying the MD/MF model put forward by Biber, Pan (2012) found that the whole oral output by Chinese EFL learners were not as colloquial as native speakers, but the utterance of learners with high L2 proficiency were similar to that of native speakers.The results reflected the weak register awareness of Chinese EFL learners.
Of some relevance to the current study is recent research exploring grammatical colloquialisation of world Englishes and the linguistics features that underpin register differences (Collins, 2013).However, empirical researches on the grammatical colloquialisation of academic writing was never set foot in.The gap has propelled the present exploration of grammatical colloquialisation in Chinese EFL learners.

Research Questions
The overall aim of the present study is to investigate the features of grammatical colloquialisation and non-colloquialisation in Chinese EFL learners' theses.The comparison is carried out between the MA theses and doctoral dissertations written by Chinese EFL learners as well as the theses written by international journals.
Here two research questions are proposed to be dealt with: (1) Compared with international journals, is there conspicuous grammatical colloquial tendency in MA theses and PhD dissertations of Chinese EFL learners?
(2) What are the similarities and differences between MA theses and PhD dissertations of Chinese EFL learners with regard to the degree of grammatical colloquialisation?

Corpus of the Research
In this study, three corpora were compiled to identify the grammatical colloquial features of the academic writings of Chinese EFL learners by comparing with international scholars.
On the one hand, the corpus of international journals (IJ) served as the reference corpus with the intention to provide comparative information of the occurrences of specific language features.The collection was derived from the theses published in three of the top 10 international journals in the field of linguistics in 2011-2014 annual, namely Journal of Memory and Language, Language Learning and English for Specific Purpose.With respective 60 theses from the three journals, the total text capacity of this reference corpus is 1,594,141 tokens.
On the other hand, the learner corpora of MA theses and doctoral dissertations were complied as the research corpora.The two corpora were MA theses and PhD dissertations collected from part of the prestigious universities in Shanghai, Beijing, Jiangsu and Guangdong provinces during years 2011 to 2013.All the language learners majored in foreign linguistics and applied linguistics.With 104 theses and 46 PhD dissertations, the MA theses corpus and the Doctoral dissertations corpus are designed to contain 1,553,639 tokens and 2,538,791 tokens respectively.
All of the three corpora are specialized corpora confined to the field of linguistics, thus the linguistic features of the target register and discipline can be illustrated clearer and the validity of our expectation can be tested.An overview of the information of these three corpora is presented in Table 1.

Instruments
Four computer software programs are applied in the present study.
(1) Treetagger was adopted to annotate the part-of-speech (POS) of all the texts in three corpora ; (2) Antconc was employed for complex search routines using regular expressions so that all the tokens of grammatical items in both spoken and academic registers could be exactly extracted; (3) Normalization was involved to get normalized frequency of the raw frequency from PowerGrep so as to guarantee the reliable comparability of three corpora; the raw frequency of each corpus was normalized by the following formula: Normalized frequency = raw frequency/corpus size*1,000,000.
(4) Log-likelihood Calculator was carried out to measure whether the differences obtained are significant or just due to chances.

Theoretical Framework
First, based on the research of Collins (2013) and Longman Grammar of Spoken and Written English (Biber, 1999), the preliminary research framework was initially drafted.
Second, the Corpus of Contemporary American English (COCA) is applied to retrieve all the grammatical items in the preliminary research framework.With the comparisons pursued in the study, our selection of colloquial features would be sensitive to genre-variation.Of the five genres in the COCA, the spoken and academic registers were employed.If a grammatical item retrieved in COCA appeared significantly more in the spoken genre than the academic genre, it would be categorized as a spoken grammatical item (SGI).Similarly, if a grammatical item appeared significantly less in the spoken genre than the academic genre, it would be classified as an academic grammatical item (AGI).
Last, according to the practical statistical information, the ultimate research framework of this study was confirmed, including three spoken grammatical categories comprising contrctions, quasi-modals and get-passives (get+Ved) as well as three academic grammatical categories composed of full forms, modals and be-passives (be+Ved) (see Table 2).

Research Procedure
Firstly, all of the data were tagged utilizing the Treetagger (Rayson, 2008).
Secondly, tokens of the grammatical items were extracted by Antconc, which allows for complex search routines using regular expressions.
Lastly, manual post-editing was resorted to when we had difficulty in telling whether a token was relevant or not just from its form and POS tags.

Results and Discussion
A comparative analysis between the Chinese EFL learners and the international scholars in terms of the grammatical colloquialisation were displayed.The findings for the three grammatical categories investigated in the present study were presented from the perspective of the overall frequency of SGIs (spoken grammatical items) in three corpora, the specific frequency of three grammatical categories and the ratios for SGIs (spoken grammatical items) retrieved in three corpora.Table 3 exhibits the overall frequency of SGIs in international journals, MA theses and doctoral dissertations.As is seen in the table, PhD dissertations displayed the strongest frequency, with MA theses followed in between, and the international journals displaying the least frequency.From the results calculated, the significant differences between the three corpora were observably found.Simultaneously, the frequency of the SGIs used in MA theses and PhD dissertations were significantly higher than international journals, so was PhD dissertations to MA theses.

Overall Comparisons of SGIs in Three Corpora
Generally speaking, the statistical evidence suggests that the EFL learners in China lack the stylistic consciousness of academic writing and fail to comply with the conventional pragmatic paradigm of academic discourse.In addition, when compared with MA theses, instead of getting weakened in colloquial tendency, the PhD dissertations increased significantly.The colloquialisation in Chinese learners' academic writing did not display the tendency to progressively decrease with the improvement of their academic capability.On the contrary, in comparison with MA theses, the PhD dissertations even retrogress to some extends, which disclosed the drawbacks of the current academic writing teaching of second language acquisition in China.As far as we are concerned, the two main possible reasons that contribute to this phenomenon are: first, requisite straining during academic writing teaching process is in short supply, which probably gives rise to the obscure stylistic awareness of the EFL learners' academic texts; what's more, the phenomenon of fossilization existing in the development of interlanguage may also explains it well.This study calculated the ratios for the three grammatical categories by the formula: ratios=SGI frequency/( SGI frequency + AGI frequency)

Contrastive Analysis of Three Grammatical Categories
The higher the ratios are, the greater the colloquial tendency is.
The ratios in Table 4 above exhibit broad similarities with the frequency calculated.Therefore, the ratios can reflect the overall colloquialisation of academic writing in a rather objective way.To sum up, the comparative analysis of the ratios for three grammatical categories suggests that the colloquial tendency of EFL learners in China is much higher than the international scholars.
Compared with international journals, MA theses and PhD dissertations significantly overused contractions.Meanwhile, PhD dissertations used significantly more contractions than MA theses.It is worth noting that, the international scholars never resorted to contracted forms, though learners used several of them.This phenomena sheds light on the fact that contractions are not the canonical paradigm universally acknowledged by the international academic writing, which illustrates to us that the explicit teaching of such aspect is greatly needed to be enhanced so as to nurture the academic literacy of the EFL learners and eliminate the concurrences of contracted forms in academic writing.The reasons why the Chinese learners overused the contractions are possibly the following two: a lack of effective input of stylistic knowledge during their language learning process and the negative transfer of their first language in which contracted forms do not features colloquial tendency of written forms.
As for quasi-modals, the statistical results plainly indicate that the PhD dissertations used significantly more of them than international journals with no significant differences between international journals & MA theses and MA theses & PhD dissertations.Seen from this, in terms of frequency, there are still great differences between MA theses and international journals and several similarities between MA theses and international journals.Thus Master Degree Candidates have met the requirements of the academic discourse standard in quasi-modals to a certain extent.
When it comes to get-passives, both MA and PhD writers overused them than international scholars.However, there is no significant difference between MA theses and PhD dissertations.As is discussed above, the grammatical variables of the world Englishes in recent years ubiquitously display colloquial tendency.With the texts extracted from the recent three-year international journals in the present study, the materials are bounded to be affected a lot by the global colloquialisation of English.This may be the cause of the small number of SGIs "get+Ved" structure appearing in international journals.Therefore, we should emphasize the cautious uses of these colloquial grammatical items in the second language academic writing teaching.
In a word, both MA theses and PhD dissertations overused three SGIs, and PhD dissertations significantly used more contractions than MA theses.Besides, fair frequency of quasi-modals and get-passives are also found between MA theses and PhD dissertations.These exhibit that the significant overuse of PhD to MA is mainly distributed in the contractions.The findings of this section suggest that there is not only fossilization but even retrogression during interlanguage development.

Comparisons of Contractions
Table 4 shows that no contracted form has ever been applied in international journals, which is not out of our expectation for the fact that all the IJ texts selected derived from the top 10 SSCI international journals in linguistics field.All the theses were from professional scholars who were vocational trained and were strictly reviewed by the jury committee, thus such low-level mistakes were successfully avoided.Table 5 above also suggests that, in comparison with international journals, both MA theses and PhD dissertations used large amount of contracted forms.Besides, MA theses and PhD dissertations significantly used more be-verbs and negation than international journals while PhD dissertations significantly used more negation (n't) than MA theses.
Moreover, among the four sub-categories of contractions, negation was most adopted (proportion of MA=77.75% & proportion of PhD=90.85%) with be-verbs (proportion of MA=19.44%& proportion of PhD=8.61%)followed behind.The statistical results indicate that we should pay more pedagogical attention to five sub-categories of contractions, especially negation and be-verbs.

Comparisons of Quasi-modals
Table 6 shows that there were broad similarities between the ratios for quasi-modals and frequency of them.In terms of the frequency of quasi-modals in three corpora, international journal adopted all the quasi-modals to different degrees exclusive of the item had better.This phenomena casts light on the current fact that even international journals have varying degrees of colloquial tendency.Although deeply restricted by grammatical prescriptivism, the academic texts still present colloquial features, which should probably be attributed to the impact of the decline of grammatical prescriptivism and the consequences of the ceaseless emergence of newly communication media such as mass media, Email, message and so forth.The normativity and stability of synchronic linguistics are relatives, but the diachronic texts are fruits from historical evolvement which are the crystallization of linguistics collective wisdom.Genres are sensitive to social reforms.Refoms along with convention and prescriptivism along with and prescriptivism violation are the major causes of stylistic evolvement.
What was to our surprise was that international journals used significantly more have to than MA theses and PhD dissertations, and even more than its equivalent item must.This, however, contradicted with the retrieved results from the Corpus of Contemporary American English (COCA) that have to was far more employed in spoken English than academic writing and must was more frequently applied in academic writing than spoken English.This finding further corroborate that the stylistic prescriptivism, often affected by social reforms, is not invariable.Swan once suggested that with have to the source of the obligation is generally external to the speaker as in where the deontic source is a rule, regulation or order, whereas with must there is usually a subjective deontic source, the speaker or listener.This difference may provide the key to the increasing popularity of have to at the expense of must to express a relatively objective will.Table 7 indicates that MA theses employed the highest frequency of get+Ved and international journals the lowest, with doctoral dissertations in between.Log-likelihood ratios showed that MA these and PhD dissertations significantly used more get-passives than international journals.However, the MA theses and doctoral dissertations exhibited broad similarities in frequency.The statistics in Table 7 verify the fact that the colloquial ratios for get-passives of MA theses and PhD dissertations are evidently higher than that of international journals.Thus, we can see that the MA and PhD candidates significantly used the structure get+Ved more.
The fact that the Chinese EFL learners' weak register awareness, not knowing that get-passives are scarcely applied in formal written writing, may account for their significant more use of get-passives.International journals, with a few numbers of get-passives, as is mentioned before, are inclined to rise due to the factors of communication media.But for us, the EFL learners, get-passives should be employed in a discreet way when writing academic articles.

Conclusion
Based on the corpus-based data-driven approach, this study has a tentative try to explore the grammatical colloquialisation in Chinese EFL learners' MA theses and PhD dissertations and the developmental tendency of the colloquial features with the development of learners' L2 proficiency from the perspective of three spoken grammatical categories consisting contractions, quasi-modals and get-passives (get+Ved).As corpus evidence indicated, the overall frequency of SGIs in MA theses and PhD dissertations are significantly higher than international journals with PhD dissertations using significantly more SGIs than MA theses.The finding suggests that the similarities between MA theses and PhD dissertations outweigh their differences and doctoral theses.Doctoral dissertations, even with some retrogression, are not less colloquial than MA theses.Great gaps indeed existed between Chinese EFL learners' writings and international journals in colloquial tendencies.And the study did not yield clear patterns of proficiency, which comply with the fossilization phenomena in grammar acquisition of Chinese EFL learners.The empirical evidence indicates that the Chinese EFL learners lack the register awareness in academic writing, which further verify the former linguistics findings.Several possible reasons can be attributed to their strong colloquial tendency: first, the input of register knowledge in the process of second language acquisition teaching is insufficient, which probably leads to the obscure stylistic consciousness of the EFL learners' academic texts; apart from it, the negative transfer of their first language may also plays a essential role; furthermore, the invasion of newly communication media such as mass media, Email, message and so forth may also contribute to the decline of grammatical prescriptivism.
The result of the present study seeks to supply some empirical evidence to the Chinese EFL learners pedagogy.Due weight in material design, classroom teaching and writing training needs to be paid to colloquialisation of academic writing by teachers and students so as to meet the grammatical paradigm of academic writing.
The result above throws into light the status quo of Chinese EFL learners' academic writing and has a comprehensive analysis on grammatical colloquial features of Chinese learners in the field of second language acquisition.Nevertheless, the present study only focused on the grammatical colloquial tendency of Chinese EFL learners specialized in the field of foreign linguistics and applied linguistics whose English literacy are relatively good in comparison to learners of other majors.As a discipline-bound linguistic feature, the study results can hardly be applied to other disciplines.Therefore, further researches are needed to be investigated in future studies of other disciplines.

Table 1 .
Source and capacity of three corpora

Table 4 .
Ratios for three grammatical categories

Table 5 .
Frequency and proportions of four contractions sub-categories in three corpora

Table 6 .
Ratios for quasi-modals in three corpora

Table 7 .
Frequency and ratios of get-passives in three corpus