Prevalence of Prefabricated Structures in Academic Discourse : A Corpus-Based Study

Multiword structures that appear in a text more than expected frequency are called lexical bundles. These prefabricated structures vary in length but the most common lexical bundles are four-word lexical bundles which have been explored by many scholars worldwide. The current study aimed to explore five-word lexical bundles, which have lesser been researched. For this purpose a corpus of about 4.7 million words was compiled which consists of PhD dissertations written in Pakistani context. Moreover, the dissertations were selected from three different disciplines to make the study cross disciplinary. The corpus was analyzed according to the taxonomy of lexical bundles given by Biber et al. (1999). The analysis shows that lexical bundles are predominant feature of PhD dissertations in Pakistani context. Moreover, frequency of lexical bundles varies from discipline to discipline, and the structural variation of lexical bundles is also found across disciplines. Dominant structures across disciplines are not fixed as Prepositional Phrase Fragments is the dominant category in the corpus of English Studies and corpus of Social Sciences, whereas, Verb Phrase Fragments is the dominant category in the corpus of Bio Sciences.


Introduction
English is a very important language for academics as it is understood by a vast majority of people in the world.Moreover, the new knowledge that is produced is either in English at first hand or is translated in English to reach a wider audience.This has benefited scholars across the world as they address wide audience without getting restricted by geographical or, in specific, linguistic boundaries.In addition, they can also have an easy access to the knowledge produced anywhere in the world without getting in the trouble of having it translated themselves.This makes English as lingua franca of the academics (Hyland, 2009a).Along with all the advantages of English in the world of academia there is a disadvantage in form discourse of native and non-native speakers or writers.This discourse gives privilege to those who have native like proficiency in English language.In this scenario, many of the scholars and students are left out because of having nonnative expression.To overcome this difficulty, scholars have been striving hard for last few decades to understand the reasons of lacking native like proficiency in English and finding out ways to help learners to get native like expression.
Although, there has been a lot of research in the above mentioned area but the advancement in the field of computers has brought revolution in the field as the researchers have been enabled to compare a large bulk of data from native and non-native context.Along with many other revelations, it has been explored that the language structures are not novel every time instead these are prefabricated in nature.After years of research the researchers and scholars have come to conclusion that these prefabricated structures enhance accuracy and fluency of the speaker of a language.The native speakers/writers of English use these structure frequently and they are also well aware of how and when to use them, which is lacking in non-native expressions.Moreover, researches (such as Hyland, 2008;Chen & Baker, 2010) show that use of prefabricated structures varies not only in native and non-native expression but it also has variation from discipline to discipline, and from one level of education to the other.There has been research exploring use and variation of prefabricated structures across the world but no such great efforts in Pakistani context are found.Hence, this study is an attempt to explore the prefabricated structures called lexical bundles in Pakistani academic setting.
The study has been limited to PhD dissertations from three different disciplines of academic discourse, which are: English Studies, Social Sciences and Bio Sciences.The research questions set for the current study are as follows: 1) How far does academic discourse rely on five word lexical bundles?
2) What are structural similarities/differences of lexical bundles across disciplines?
3) Which of the structural categories are dominant in different disciplines?

Literature Review
Over the period of last few decades there has been increasing interest in the co-occurring words.These words have been explored and discussed from different perspectives.Among the first to draw attention towards such combinations was Firth (1957).These combinations were observed to occur more with than expected frequency in the given text, and Firth (1957) termed them as collocations.Despite of naming them he did not give a proper definition to these word combinations.The interest in the word combination units kept on increasing by the time Halliday (1964) et al. defined collocation as a group of words having a tendency to co-occur with each other.Similarly, some other scholars (such as Yorio, 1980;Cowie, 1988;Moon, 1992, Cortes 2004) suggested these expressions as conventionalized language forms, speech formulas, readymade expressions, multi-word units and fixed expressions.Similar to the linguistic terms used for these expression, these multiword units have been studied from different aspects and perspectives but researchers and scholars have not given a unanimous definition for categorizing similar or distinguishing types of word combinations.
The focus on the research increased with the research in this field by Biber et al. (1999) who explored a large amount of data with the help of computer softwares that was not possible before the advancement in the field of information technology.The computer tools used by them helped explore word patterns of varying lengths in a large bulk of data.As described above, there have been various names for the combinations of such words prior to Biber et al. (1999:990), such as prefabricated structures, fixed expressions, formulaic structures, lexical phrases etc., but they called them lexical bundles.According to Biber et al. (1999), regardless of their idiomaticity and their lexical status, lexical bundles are recurrent word combinations that co-occur in natural language use.Moreover, these are frequency-driven word combinations that comprise of three or more than three words functioning as building blocks for any text (Biber et al., 2004).Similarly, Nattinger and DeCarrico (1992) explored the multiword expressions and called them prefabricated language.According to them it could be easily retrieved as this exists ready made in the mind.This helps the speakers to get speak fluently owing to the fact of readiness (Yorio, 1980).Such expressions save the speaker from the efforts to make new structure every time they engage in the speech activity.Similarly, it also facilitates the listener to interpret the massage quickly.Hence, communication becomes fluent, spontaneous and easy for the communication counterparts.Moreover, the speakers find prefabricated chunks of language helpful as they save them from the time required to find appropriate choices of words and word forms as Cortes (2002) also states that the lexical bundles are helpful in fluent speaking and writing and also accelerate language acquisition.Along with this, lexical bundles help establish contact in the communication as these perform phatic function (Drazdauskien, 1981).Moreover, a study by Yu & Kim (2017) states that learners of different proficiencies show significant improvement in their error control if they are taught with the help of lexical bundles.The most frequent error by learners in the study was omission of articles which are within bundles which can be avoided if are taught with the help of lexical bundles.The results also highlight that bundles, article-including expressions, can be helpful to teach article uses in context.
It is important to note down the difference between different types of multi word units.A point in case is idioms which have fixed structure and meanings.Whereas, lexical bundles have their frequency count that makes them different from the other fixed structures.Apart from the frequency the other difference is the method of their extraction (Cortes, 2002;Biber, 1999).
To be identified as lexical bundle a multiword unit needs to occur at least ten times in a million words and five texts of the same genre.Hence, the word structures that meet this frequency cut off are known as lexical bundles (Biber et al., 1999).
After the identification of the bundles the next stage is for the categorization of lexical bundles as per their structures and functions.These can be categorized in polyword units, sentence building expressions, phrasal constraints and institutionalized expressions (Nattinger & DeCarrico, 1992).Among these types, polyword units function similar to individual words, on the other hand, sentence builders are short phrases that become a basis for a sentence.The third type of the categorization is phrasal constraint structures which are phrases of varied length which most of the time are sentence beginners, and the last type of is institutionalized structures that are basically short sentences which are fixed.Among these polyword units and phrasal constraints can be both canonical and non-canonical, and are continuous.But in respect of their variability, polyword units show no variability, whereas, phrasal constraints allow variation.On the other hand, institutionalized expressions are canonical and invariable, whereas, sentence builders can be canonical or non-canonical and continuous or discontinuous.Moreover, sentence builders show a lot of variation too.Although this categorization is an important contribution for the explanation and investigation of lexical bundles, but still it is not a clear scheme of categorization where sometimes a bundle may belong to multiple categories or it may not fall in any of these categories.
In this regard the other problem in the placement of prefabricate structures in different categories is because of the exclusion of discourse function from the features of these structures as discourse is decisive in the production of language variety in terms of its form and function (Bhatia, 1993).For example, research in the field of lexical bundles show that the bundles explored in science text contain assertive sentences, similarly, Hyland (1998) shows the descriptions of the scientific procedures and results, moreover, these also have the features of academic discourse which have been highlighted by other scholars (such as Nattinger & Decarrico, 1992).Majority of the lexical bundles found in academic discourse are fragments of noun phrase (Cortes, 2004).
The functional aspect of lexical bundles in academic discourse shows that these work as discourse devices to connect different part of discourse and help in meaning making process.Biber (1999) explored conversation and academic prose for lexical bundles, whereas, Conrad and Biber (2005) researched students' writing in the disciplines of history and biology for lexical bundles.They proposed that lexical bundles function as building blocks of academic discourse.A qualitative analysis was conducted by Wang & Ying (2017) for functional variations of I don't know if.The study shows that genre and discipline are both very important factors in understanding academic ELF communication and idiomaticity which cannot be ignored; moreover, study of lexical bundles provides useful understanding of genre and disciplinary variation.
The above discussion shows that a lot of research has been carried out worldwide but perhaps no attention has been paid to explore lexical bundles in Pakistani academic discourse.Therefore, the current study attempts to investigate lexical bundles in Pakistani academic discourse.The study has been limited to PhD dissertations from three different disciplines of academic discourse, which are: English studies, Social Sciences and Bio Sciences, and it focuses only on five-word lexical bundles.

Corpus
For the purpose of this study a corpus of PhD dissertation has been compiled which consists of 4.7 million words.For the collection of data three disciplines i.e.English Studies, Social Science and Bio-Sciences were selected.Furthermore, thirty dissertations from each discipline (ten from three subjects each) were included in the corpus.The dissertations selected for this study were the latest ten dissertations in each subject area.The details of the corpus data are as follows:

Tools for Corpus Analysis
For the analysis of the corpus AntConc 3.4.4w(Windows) 2014 software was used.This corpus software helped in identification of the lexical bundles as it can show frequency as well as range of a structure in a number of texts which is the necessary of the identification and analysis of lexical bundles.

Theoretical Framework
The lexical bundles which were extracted from the corpus are categorized according to Biber et al.'s (1999) taxonomy.The lexical bundles which did not fall in any of the existing categories were placed in new categories that made the analysis in depth.The details of the categorization have been provided in the section of results and discussion.

Results and Discussion
This section has further been divided in four parts where three parts consist of the result and discussion of lexical bundles in three different disciplines, whereas, the final part deals with the comparison of lexical bundles across disciplines.
For the identification of five-word lexical bundles, the length of N-Gram was fixed at 5, the range on 5, and frequency on 10 per million.This setting gave a list of lexical bundles which were then categorized according to Biber et al.'s (1999) taxonomy according to their structures.Following is the detailed discussion:

Five Words Lexical Bundles in English Studies
Table 2 shows that the three subjects included in the discipline of English Studies all have lexical bundles.The number of lexical bundles is not very high, which is unlike the case with 4 word lexical bundles (Yousaf, 2018) The Graph 1 below shows the comparison of five-word lexical bundles in the three corpora of English Studies.
The height of bars shows that there is different number of lexical bundles in in different categories across the subjects of English Studies.The graph shows that Prepositional Phrase Fragment has highest number of lexical bundles in all three subjects that make this category very significant for the corpus of English Studies. ijel.ccsenet.

Table 1 .
Corpus for the study

Table 2 .
, but still all the categories have representation of lexical bundles.The table shows that the largest category of lexical bundles in linguistics is Prepositional Phrase Fragments which consists of 17 lexical bundles, that makes 50% of the total lexical bundles in the corpus of linguistics.The same category is the dominant category in the corpus of Literature where it has 05 lexical bundles which are 45.5% of the total lexical bundles in the corpus of literature.Similarly, the corpus of ELT too has Prepositional Phrase Fragments as the largest category which consists of 09 lexical bundles, which make 31% of the total five-word lexical bundles in the corpus of ELT in this study.The second largest category of lexical bundles in the corpus of linguistics is Verb Phrase Fragments.It has 05 lexical bundles which is 14.70% of the five-word lexical bundles in the corpus of linguistics.On the other hand, Other Expressions is the dominant category in the corpus of literature; this category has 03 five-word lexical bundles which is 27.27% of the total bundles in the corpus.Moreover, making 27.59% of the total bundles in this corpus, Noun Phrase Fragments is the largest category in the Corpus of ELT which has 08 five-word lexical bundles.This shows that unlike the most dominant category, the second highest number of lexical bundles do not belong to the same category which confirms the disciplinary variation.The third largest category of five-word lexical bundles in the corpus of Linguistics is Anticipatory It, That/There/To Clause Fragments consisting of 04 bundles it makes 11.76%% of the total lexical bundles in the corpus.Whereas, the corpus of Literature, in this study, has 01 bundle each in Verb Phrase Fragments, Adverbial/Adjectival Phrase/Clause Fragments and Anticipatory It, That/There/To Clause Fragments which is 09.09% in each category.This number is very small as compared to the four-word lexical bundles in the same corpus.On the other hand, the third largest category in the corpus of ELT, Verb Phrase Fragments, has 06 lexical bundles that make 20.69% of the total five-word lexical bundles in the corpus of ELT in this study, which is a comparatively higher percentage.Lastly, in the table 2 we can see the remaining categories have very low number of five-word lexical bundles or there are no lexical bundles at all.Particularly, Verb/Adjective/Noun + to/that Clause Fragments has only 03 lexical bundles in the corpus of Linguistics, and the remaining two corpora have no lexical bundles in this category.Similarly, there are no lexical bundles in Noun Phrase Fragments in the corpus of Literature.Five words lexical bundles in English studies

Table 3
Five Words Lexical Bundles Across the DisciplinesTable5shows that all of the three disciplines included in this study have five-word lexical bundles.The table shows that the largest category of lexical bundles in English Studies is Prepositional Phrase Fragments which consists of 31 lexical bundles, that makes 41.89% of the total lexical bundles in the corpus.Similarly, Prepositional Phrase Fragments is the dominant category in the corpus of Social Sciences too where it has 24 lexical bundles which are 38.09% of the total lexical bundles in the corpus under study.On the other hand, the corpus of Bio Sciences has Verb Phrase Fragments as the largest category which consists of 08 lexical bundles, that makes 30.77% of the total five-word lexical in the corpus.This shows that English Studies and Social Sciences both have similarity in the use of majority of lexical bundles whereas Bio Science are different.The second largest category of lexical bundles in the corpus of English Studies is Verb Phrase Fragments.It has 12 lexical bundles which is 16.22% of the five-word lexical bundles in the corpus of English Studies.On the other hand, Noun Phrase Fragments is the dominant category in the corpus of Social Sciences.This has 22 five-word lexical bundles which is 34.92% of the total bundles in the corpus.However, Prepositional Phrase Fragments is the largest category in the Corpus of Bio Sciences where it has 05 five-word lexical bundles that make 19.23% of the total bundles in this corpus.The third largest category of five-word lexical bundles in the corpus of English Studies is Noun Phrase Fragments that, consisting of 10 five-word lexical bundles, makes 13.51% of the total lexical bundles in the corpus of English Studies.On the other hand, the corpus of Social Sciences has 06 bundles in Adverbial/Adjectival Phrase/Clause Fragments which is 09.52% of the total bundles in the corpus.Whereas, Noun Phrase Fragments and Anticipatory It, That/There/To Clause Fragments both have 04 lexical bundle each that makes 15.38% of the total five-word lexical bundles in the corpus of bio Sciences in each category, and hence, both stand as the third largest category in the corpus of Bio Sciences.Lastly, in the table we can see the remaining categories have lower number of five-word lexical bundles or there are no lexical bundles at all.Only Anticipatory It, That/There/To Clause Fragments has 09 lexical bundles which have percentage 12.16 in the corpus of English Studies, similarly Other Expressions in the corpus of Bio Sciences has 03 lexical bundles which is 11.34% of the total bundles of the Bio Science corpus.Otherwise, all remaining categories have less than 10% of lexical bundles in the respective corpora.As it can be seen, Verb/Adjective/Noun + to/that Clause Fragments has only 03 five-word lexical bundles in the corpus of English Studies, whereas, there are no five-word lexical bundles belonging to this category in the corpora of Social Sciences and Bio Sciences.

Table 5 .
Five word lexical bundles in the corpusThe Graph 4 below shows the comparison of five-word lexical bundles in the three disciplines under study.It can be seen that there is different number of lexical bundles in different categories across the three disciplines.The graph shows that Prepositional Phrase Fragments overall has highest number of lexical bundles that makes this category very significant for the corpus.Moreover, Noun Phrase Fragments is a significant category in the corpus of Social Science.However, the corpus of Bio Science has highest number of Verb Phrase Fragments which is the dominant category in the corpus of Bio sciences.