A Corpus-based Study on the Use of Three-word Lexical Bundles in the Academic Writing by Native English and Turkish Non-native Writers

The utilization of English recurrent word combinations –lexical bundlesplay a fundamental role in academic prose (Karabacak & Qin, 2013). There has been highly limited research about comparing Turkish non-native and native English writers’ use of lexical bundles in academic prose in terms of frequency, structure and functions of lexical bundles (Bal, 2010; Karabacak & Qin, 2013, Öztürk, 2014). Therefore, this current research was conducted in order to investigate the most frequently used lexical bundles in the academically published articles of Turkish non-native and native speakers of English and to investigate whether there was a significant difference between native and non-native scholars with respect to the frequency, structures and functions of English language lexical bundles. The data were collected from two corpora; 15 scientific articles of native speakers and 15 scientific articles of Turkish advanced writers. The investigation included a quantitative analysis of the use of three-word lexical bundles and a qualitative analysis of the functions and structures they serve. To be more conservative, three-word lexical bundles which occur 40 times per million words and appear in 5 different texts were described a lexical bundle in this current research. The findings revealed that Turkish non-native writers showed underuse and less variation in the use of lexical bundles in their academic prose compared to native speakers.


Introduction
According to the findings by corpus-based studies, it has been widely agreed that lexical bundles are necessary building blocks for written discourse (Biber & Conrad, 1999;Cortes, 2006;Hyland, 2008a;Li & Schmitt, 2009).Analyses of academic corpora have demonstrated that lexical bundles are widespread in written registers (Biber et al., 2004;Biber & Barbieri, 2007).In one study, lexical bundles were found to constitute 52.3% of the written discourse (Erman & Warren, 2000).Therefore, the acquisition of these recurrent word combinations are significant for the development of academic writing skills for at least three reasons: Firstly, lexical bundles are usually repeated and an essential part of the structural material; Secondly, as they are frequently used, lexical bundles are defining markers of successful writing; Finally, these bundles are the combination of grammar and vocabulary, thereby lexicogrammatical underpinnings of a language (Coxhead & Byrd, 2007).
According to some scholars, the frequent use of lexical bundles in academic writing signifies competent language user in writing, the absence of these bundles reflects the signal of novice writers (Haswell, 1991;Cortes, 2004;Hyland, 2008a;Chen & Baker, 2010).In this aspect, Cortes (2004) argues that a certain usage of lexical bundles is an indication of a competent language user.Similarly, Ellis, Simpson-Vlach and Maynard (2008) state that frequently used lexical bundles results in a natural language.
However, the majority of corpus-based studies have demonstrated that learners' employment of recurrent multi-word combinations is often problematic (Cortes, 2004;Hyland, 2008b;Li & Schmitt, 2009;Chen & Baker, 2010;Wei & Lei, 2011;Adel & Erman, 2012).According to research, although non-native learners can produce a number of native-like formulaic sequences, their limited use of formulaic sequences cause them to overuse such sequences, which makes learners' writing seem non-native (Li & Schmitt, 2009).Similarly, some studies also showed non-native learners overused or underused some lexical bundles in their writing and they used more limited and less varied lexical bundles (Allen, 2009;Adel & Erman, 2012).Even advanced non-native English learners and second language learners have substantial problems acquiring lexical bundles (Bishop, 2004;Karabacak & Qin, 2013).To researcher's knowledge, relatively few studies have focused on the issue of corpus-based studies of Turkish writers' usage of lexical bundles (Bal, 2010;Karabacak & Qin, 2013;Öztürk, 2014).Therefore, this current research was conducted in order to investigate the most frequently used lexical bundles in the academically published articles of Turkish non-native and native speakers of English and to investigate whether there was a significant difference between native and non-native scholars with respect to the structures and functions of English language lexical bundles.

Literature Review
The term of 'lexical bundle' was initially created by Biber, et al. (1999) in the thirteenth chapter of the Longman Grammar of Spoken and Written English (LGSWE).Biber et al. (1999, p. 990) describe lexical bundles as "recurrent expressions, regardless of their idiomaticity, and regardless of their structural status" and as "simply sequences of word forms that commonly go together in natural discourse".Cortes (2004, p. 400) also gives another consistent definition of lexical bundles as "extended collocations of three or more words that statistically co-occur in a register".Biber & Conrad (1999, p. 183) identify lexical bundles as "the most frequent recurring lexical sequences …, which can be regarded as extended collocations: sequences of three or more words that show a statistical tendency co-occur.".Biber et al. (1999) took a minimal frequency cut-off of at least ten times per million words for a sequence to be regarded as a lexical bundle, whereas Biber et al. (2004) have taken a more conservative approach setting a relatively high frequency cut-off point that a lexical bundle must recur forty times per million words so as to be considered as a lexical bundle.Another prominent characteristic of lexical bundles is that lexical bundles are different from idioms.The last distinguishing feature of lexical bundles is that lexical bundles usually perform incomplete structural units.
Among corpus-based studies focusing on native and non-native academic writing, Chen and Baker (2010) compared the usage of lexical bundles in native and non-native speakers' academic writing in order to find out the potential trouble spots in SLA.The learner corpus was made up of writing from L1 Chinese learners of L2 English whereas other two corpora were made up of L1 writing from native academicians and university students.At the end of the study, the findings revealed that there were significant differences and similarities between native and non-native academic writing.The use of lexical bundles in native and non-native students' academic writing was similar when compared to native academicians, which included more VP-based bundles and discourse markers than native academic writing, "which appears to be a sign of immature writing" (Chen & Baker, 2010, p. 44).Moreover, non-native writing underused some high-frequent lexical bundles of native academic writing and overused certain lexical bundles which were rarely used in native writing.Another study on non-native academic writing was conducted by Wei and Lei (2011) investigating the use of lexical bundles in the academic writing of advanced Chinese EFL learners.The findings collected from the study demonstrated that advanced learner writers made use of much more lexical bundles and much more varied lexical bundles in their academic writing than professional writers.Similarly, Adel and Erman (2012) investigated the use of English-language lexical bundles in advanced learner writing by L1 speakers of Swedish and native speakers who were undergraduate students of linguistics.The results of the study showed that non-native speakers showed an inclination to use more limited and less diverse lexical bundles than native speakers.
Nevertheless, the research on corpus-based studies of Turkish writers' usage of lexical bundles was quite restricted.The first study conducted by Bal (2010) investigated the use of four-word lexical bundles in the research articles of Turkish writers.The most frequent lexical bundles used were 'on the other hand, the end of the, as well as the, in the case of and one of the most' in TSRAC.The researcher classified these bundles structurally and functionally.Öztürk (2014) investigated the usage of Turkish and native English postgraduate students' and native writers in a specific academic discipline with regard to the structures, functions and frequency of lexical bundles using the control corpus.The results of the study showed that Turkish postgraduate students made use of lexical bundles more frequently than native students and writers.Nevertheless, Turkish postgraduate students overused most of the lexical bundles.Lastly, Karabacak and Qin (2013) investigated the comparison of the use of lexical bundles in the argumentative papers of three groups of university writers; Turkish, Chinese and Americans.The findings gathered from the study indicated that even advanced English learners had difficulty in acquiring some lexical bundles through simple exposure.As there have been highly limited studies on the issue of Turkish writers' usage of lexical bundles (Bal, 2010;Karabacak & Qin, 2013;Öztürk, 2014), this current study makes an attempt to answer the research questions below: 1) What are the most frequently used three-word lexical bundles in the academically published articles of Turkish non-native and native speakers of English?
2) Are there any significant differences between native and non-native scholars with respect to the structures and functions of English language three-word lexical bundles?

Method
This part explains the features of the research corpora (expert and learner corpora) and how it was compiled.Then, the structural and functional taxonomy of lexical bundles were identified in detail.

Expert Corpus
The expert corpus was made up of 15 scientific articles written by English native speakers in the disciplines of Theoretical and Applied Linguistics and English Language Teaching (144.451running words) within a certain time interval of the last 11 years (between 2005-2016).The scientific articles were gathered from distinguished journals; Journal of Pragmatics, Lingua, English for Specific Purposes, System, Teaching and Teacher Education, Learning and Individual Differences, Procedia-Social and Behavioural Sciences and Cognition.

Learner Corpus
The learner corpus was made up of 15 scientific academic articles published by Turkish non-native writers in the same disciplines of Theoretical and Applied Linguistics and English Language Teaching (124.250running words) within a certain time interval of the last 11 years (between 2005-2016).The scientific articles were collected from the distinguished journals as follows: Journal of Second Language Writing, Lingua, Journal of Pragmatics, Journal of English for Academic Purposes, System, Procedia-Social and Behavioral Sciences, Computer Assisted Language Learning and Teaching and Teacher Education.The criteria used to collect the learner and control corpora were the particular fields of linguistics and English language teaching and writers' native languages.
Table 1 shows the quantity of running words and scientific articles used in learner and control corpora.After the collection of scientific articles, all tables, references, figures and charts were removed from the texts to prepare them for analysis.The present study focused on three-word lexical bundles as three-word lexical bundles are more frequently used in academic writing than longer lexical bundles.To be more conservative, Biber et al.'s (2004) frequency approach was adopted by the researcher.Three-word lexical bundles which occur 40 times per million words and appear in 5 different texts were described a lexical bundle in this current research.Ant Conc 3.4.4programme was used in this research to discover lexical bundles.This programme made a list of the three-word lexical bundles requiring the cut-off points of at least 40 occurences in 5 different texts in the corpus.Furthermore, the comparisons were made between the learner and expert's corpora to find out differences of structures, frequencies and discourse patterns of usage of the most frequent lexical bundles used in native and non-native academic writing.

Structural Taxonomy of Lexical Bundles
Biber et al's (1999) structural taxonomy was adopted by the researcher as it was the first and only taxonomy developed by Biber et al (1999) in the book called 'Longman Grammar of Spoken and Written English' (shown in Table 2).

Functional Taxonomy of Lexical Bundles
After the structural classification of lexical bundles, Biber et al. ( 2004) developed a functional distribution of lexical bundles for conversation and academic prose.Three preliminary functions were employed by lexical bundles: stance bundles, discourse organizers and referential bundles.These functions were defined as (Biber et al., 2004, p. 384): "Stance bundles express attitudes or assessments of certainty that frame some other proposition.Discourse organizers reflect relationships between prior and coming discourse.Referential bundles make direct reference to physical or abstract entities, or to the textual context itself, either to identify the entity or to single out some particular attribute of the entity as especially important.".
In the current study, the lexical bundles were categorized functionally depending on these three functions, and when necessary, concordance lines were controlled in order to find out the functions of lexical bundles.

Overall Frequencies between the Corpora
First of all, the overall number of lexical bundles in native and non-native writing was identified.Table 3 has demonstrated the overall frequencies of lexical bundles in the academically published articles of Turkish non-native and native speakers of English.As can be seen in Table 3, Turkish non-native academic writing (n= 523) employed higher three word lexical bundles than that of native professional writers (n=513).However, Turkish non-native academic articles showed less varied lexical bundles (7 bundle types) compared with the academic writing of English native speakers (10 bundle types).Therefore, it can be concluded that although Turkish non-native writers have a tendency to employ higher number of three-word lexical bundles patterns identified in academic writing, they use less varied lexical bundles than professional writers.
Furthermore, frequencies per million words and per texts were also calculated in order to compare the standardized findings between the corpora.As shown in Table 4, Turkish non-native speakers used three-word lexical bundles more often (4209 occurrences per million words) than English native speakers (3551 occurrences per million words).Results demonstrated that native professional writers employed these lexical bundles an average of 25.65 per text in academic writing while non-native writers (Turkish) made use of it an average of 26.15 per text in their written discourse.
In terms of the three-word lexical bundle type and range, Table 5 demonstrates the most frequent bundles in academically written texts of native and non-native writers.According to the Table 5 above, the most frequently used three-word lexical bundle in Turkish academic written texts was 'the use of', which was employed 170 times and also the most frequent bundle in English academically written texts with a frequency of 81 times.Comparing the two corpora regarding the most frequently used lexical bundles, four of these bundles were shared bundles used by both native and non-native writers.These bundles were 'the use of, in terms of, in order to and as well as'.However, except the bundle of 'as well as', bundle tokens of these bundles were much higher in Turkish non-native texts than those of native professional writers.

Structures of Lexical Bundles
The lexical bundles were categorized structurally depending on the structural taxonomy of Biber et al. (1999).
Table 6 shows the distribution of structures employed by Turkish non-native and English native texts.According to the Table 6, English native writing included slightly more NP with of phrase fragment (use of the, one of the, part of the) while Turkish non-native academic writing tended to use more other PP (with respect to, in order to, of the participants) compared with native texts.Nevertheless, four types of structures of lexical bundles were used by both native and non-native writers; NP with of phrase (the use of, use of the, one of the, part of the), PP with of phrase fragment (in terms of), other PP (in order to, with respect to, of the participants, on the other (hand)) and other expressions (as well as).

Functions of the Lexical Bundles
The three-word lexical bundles were classified functionally based on the functional taxonomy of Biber et al. (2004).Figure 1 demonstrates the functional distribution of lexical bundles employed by native and non-native texts.
Figure 1.Functions of three-word lexical bundles According to Figure 1, English native writers employed slightly more referential bundles than their non-native counterparts in order to make direct reference to physical or abstract entities, or to the textual context itself.Native texts also employed other bundle types (stance and discourse organizers) more than Turkish non-native writers.For example, the stance bundle "the fact that" was only used by native writers:  "Particularly in view of the fact that the analyses in this paper are not corpus-based, it is beyond the scope of this paper to consider in detail ways in which institutional setting, text-type, or other text-categorizations…" "On the other (hand)" was another discourse organizer bundle employed by English native writers:  "Biologists, on the other hand, made considerable use of resultative markers, bundles which introduce writer's interpretations and understandings of research processes" As for the referential bundles, "one of the" and "part of the" were the referential bundles which were employed in native texts but not in non-native texts:  "As the analysis shows, one of the primary characteristics, and (perhaps) goals of the quiz game is that students perform as student-contestants."  "High pitch level in student answer bids can be considered one part of the physical and discursive practice of students in the co-construction…" On the other hand, two of the bundles (with respect to, of the participants) were only used by Turkish non-native writers:  "…with a strong tendency to appeal to people's inner, true selves both with respect to their emotions and their inner wishes and aspirations." "…and clarify why some of the participants attempted categorizations or commented on groupings as displayed in the excerpts." "The initial comments of the participants indicate the power of the writing teacher and her expectations from the students."

Discussion
The present study was conducted to investigate the most frequently used lexical bundles in the academically published articles of Turkish non-native and English native speakers and to investigate whether there was a significant difference between native and Turkish non-native scholars with respect to the frequency, structures and functions of English language lexical bundles.The data were collected from two corpora; 15 scientific articles of native speakers and 15 scientific articles of Turkish advanced writers.The investigation included a quantitative analysis of the use of three-word lexical bundles and a qualitative analysis of the functions and structures they serve.To be more conservative, three-word lexical bundles which occur 40 times per million words and appear in 5 different texts were described a lexical bundle in this current research.
The findings gathered from the study demonstrated that although Turkish non-native writers employed higher number of three-word lexical bundles patterns identified in academic writing, they use less varied lexical bundles than English professional writers.Regarding the most frequent three-word lexical bundles in native and non-native academic writing, the most frequently used three-word lexical bundle in Turkish academic written texts was 'the use of', which was employed twice more than those of native writers.Four of the most frequent bundles were shared bundles used by both native and non-native writers.These bundles were 'the use of, in terms of, in order to and as well as'.As for the structure taxonomy of lexical bundles, English native writing included slightly more NP with of phrase fragment (use of the, one of the, part of the) while Turkish non-native academic writing used more other PP (with respect to, in order to, of the participants) compared with native texts.Lastly, regarding the functions of lexical bundles, Turkish non-native writers used more referential bundles in their academic writing compared to other bundle types.However, they use less varied bundles when compared to native texts.
The results of the present study are consistent with the previous studies that showed non-native learners overused or underused some lexical bundles in their writing and they used more limited and less varied lexical bundles (Allen, 2009;Adel & Erman, 2012;Li & Schmitt, 2009).Adel and Erman (2012) conducted a study to investigate the use of English-language lexical bundles in advanced learner writing by L1 speakers of Swedish and native speakers who were undergraduate students of linguistics.The findings showed that non-native speakers showed an inclination to use more limited and less diverse lexical bundles than native speakers.Another study conducted by Bal (2010) demonstrated that the most frequent four-word lexical bundles in Turkish academic writing were "on the other hand, as well as the, and one of the most' which were consistent with the current study although the current study focused on three-word lexical bundles.Another study conducted by Öztürk (2014) concluded that Turkish non-native writers used lexical bundles more frequently than native writers which is in line with the finding of the current study.

Pedagogical Implications
Studies demonstrated that lexical bundles are not acquired in a natural way, even simple exposure to the lexical bundles is not enough for learners to use the lexical bundles actively (Cortes, 2004(Cortes, , 2006;;Karabacak & Qin, 2013;Wei & Lei, 2011).Even advanced learners have substantial problems on lexical bundles (Bishop, 2004;Karabacak & Qin, 2013).Therefore, entailing deep level of processing, explicit teaching of lexical bundles might be one of the solutions the language instructors might use to foster learners' acquisition process of lexical bundles in their writing.
It is also clear that lexical bundles are acquired incrementally just like single words.(Schmitt, 2000;Nation, 2001;Schmitt et al., 2004;Li & Schmitt, 2009;Čolović-Marković, 2012).Based on this fact, learners are in need of a large amount of repeated exposures to acquire lexical bundles.In this aspect, noticing, retrieval and generative activities such as rephrasing (Peters & Pauwels, 2015), substitution tasks (Salazar, 2014), writing activities (Nation, 2001) or Google search applications, techniques and tasks (Zengin, 2009;Zengin & Kaçar, 2015) are some of many ways that writing instructors can benefit in the EFL classroom to enhance learners' successful acquisition and retention of these multiword combinations.
Material developers and writing course designers can design materials including multi-word combinations in textbooks of writing classes in language programs with limited or extended contexts from Coca to enhance in-depth knowledge of the uses and functions of lexical bundles.

Limitations
There are several limitations to the current study.The results of this study need to be treated with some caution since the the corpora include only one academic discipline and a small corpus size; they cannot be generalized to all the disciplines.Further research could be conducted with more disciplines and corpus size.

Table 1 .
Number of words and articles in learner and control corpora

Table 3 .
Total frequencies of three-word lexical bundles in each corpus

Table 4 .
Raw frequencies and frequencies per million words & texts for the most frequent lexical bundles in the corpora

Table 5 .
The most frequent three-word lexical bundles in the corpora

Table 6 .
Structures of three-word lexical bundles in the corpora