Lexical Bundles in Contract Law Texts: A Corpus-Based Exploration and Implications for Legal Education

This paper reports on a study which explores lexical bundles in Contract Law, a key subdivision of the legal discourse. Based on a corpus of full-length texts, a total of 117 patterns are retrieved, refined and further subjected to structural as well as functional analyses. The results show that text authors make use of a wide range of lexical bundles, most of which are structurally phrasal and functionally research-oriented. Text-structuring sequences and participant-oriented bundles appear in the corpus, but are comparably far less employed. Also, the analysis of data established the domain-specific nature of patterns which revolve around the concept of contract. This paper concludes by discussing these findings and their implications for language learning, teaching and the ESP/EAP pedagogy.


Introduction
Several studies maintain that academic speech and writing involve the use of a large number of recurrent multiword constructions which can be located, retrieved and analyzed for their structural forms and discourse functions (Biber & Barbieri, 2007;Biber, Conrad, & Cortes, 2004;Biber, Johansson, Leech, Conrad, & Finegan, 1999;Cortes, 2004;Hyland, 2008aHyland, , 2008b. These patterns are studied using a range of terms, the most common of which is that of lexical bundles (e.g., Breeze, 2013;Durrant, 2017;Esfandiari & Barbary, 2017). Lexical bundles are perceived as "words which follow each other more frequently than expected by chance, helping to shape text meanings and contributing to our sense of distinctiveness in a register" (Hyland, 2008b, p. 4). The pervasive use of such lexical bundles is not restricted to a particular genre, register or discipline, as evidence shows that domains of various types and dissimilar communicative purposes employ a wide range of structurally different and functionally distinct bundle types.

Academic Writing and Lexical Bundles Research
Written academic texts have been the subject of several studies aimed at unveiling their rhetorical structures, linguistic features and their communicative purposes. Biber et al. (2004, p. 374) argue that textbooks and classroom teaching are "arguably the two most important registers in the academic lives of university students". Hyland (2009) maintains that "textbooks are indispensable to academic life, facilitating the professional's role as a teacher and constituting one of the primary means by which the concepts and analytical methods of a discipline are acquired" (p. 68). Surveying the differences as well as the similarities that exist across registers, genres and styles, Biber and Conrad (2009) concluded that written textbooks are produced to inform and educate rather than to disseminate fresh ideas. The communicative focus of textbooks is usually placed on laying out well-established facts, rather than announcing previously unknown findings.
Academic textbooks have been studied for the use of academic bundles in a range of different contexts. Grabowski (2015) conducted a study in which a corpus of information leaflets, product summaries, clinical trial protocols and chapters from textbooks is examined for both keywords and lexical bundles. The results indicate that textbooks have the greatest number of keywords but the fewest number of lexical bundles compared with the three other sub-corpora. The concentration of a large number of keywords in this corpus, the author argues, is related to the discipline-specific nature of textbooks. The paucity of lexical bundles in textbooks, however, is attributed to the nature of texts which are lexically dense and less formulaic. In a similar study, Breeze (2013) explored the distribution of lexical bundles across four legal sub-registers: academic law, case law, legislation and documents. Drawing on a corpus of two million words, the researcher carried out a structural and functional analysis of repeated formulaic patterns unveiled as a result of the corpus analysis. Legislation and documents corpora manifest the widest range of bundles whereas academic and case laws include the fewest bundles. Structurally, the author adopts a lexico-grammatical approach, thus dividing bundles into four categories: content noun phrases, prepositional phrases, adjectival phrases and fragments containing a verb phrase. With the exception of case law, the greatest number of bundles in the three other register types involves content noun phrases denoting agents, institutions, and documents. Most bundles in academic law refer to either abstract or action entities. In a similar fashion, the corpus incorporating academic textbooks has the smallest range of bundles in a study conducted by Biber and Barbieri (2007) who contrasted the presence of such bundles across a wide range of registers and academic domains. Functionally, textbook corpus is dominated by, first, referentials, and then discourse-organizers. Stance expressions are the least employed bundle type.
The use of lexical bundles by nonnatives/novices has been contrasted against the use of the same bundles by native/expert writers with inconsistent, and to some extent contradictory, results (Ädel & Erman, 2012;Bychkovska & Lee, 2017;Chen & Baker, 2010;Cortes, 2004;Esfandiari & Barbary, 2017;Llanes & Muñoz, 2009;Pan et al., 2016). While some studies maintain that native and professional writers demonstrate a thorough understanding of a wider range of different recurrent patters than do nonnatives and less experienced writers (e.g., Ädel & Erman, 2012), some other studies point to the opposite, that is, student-or novice-produced writings incorporate a great number of lexical patterns when compared with writings produced by natives or professionals (e.g., Bychkovska & Lee, 2017). These discrepancies arise as a result of differences in the study design, the discipline under study and the type of genre that is investigated.

Overview of the Legal Discourse
Legal language has been the subject of several research studies throughout the past decades. Much research into the legal discourse revolves around the syntax and semantics of the legal prose, with a particular attention given to the challenges facing novices and non-experts in understanding the legal content. Statements of legal nature are relatively long, densely nominal and distinctly complex as they comprise archaic and semi-archaic forms (e.g., hereinafter), rare expressions (e.g., annul) and opaque formulae (e.g., corporate veil). Legal texts, furthermore, incorporate a great number of familiar terms carrying unfamiliar meanings (e.g., distress & find), passivized constructions, odd prepositional phrases, performative markers and a wide range of law-specific Latin-origin concepts (Cao, 2007;Haigh, 2015;Trosborg, 1997). Another reason that makes the legal text difficult to decode lies in the fact that legal language is "system-bound" in which "terms denoting concepts derive their meanings from a particular legal system" (Northcott, 2012, p. 218). In this case, a widely used legal term in a specific judiciary system may not have an equivalent term in another system. Vass (2017) adds another layer of difficulty which concerns the increasing number of law students and professionals who come from a non-English-speaking background in which legal concepts, terms and rhetorical conventions are learned and delivered in the students' native language. The inherently complex nature of the legal writings has given rise to what is now known as the Plain English Movement (Hartig & Lu, 2014) which calls for embracing a far clearer, less archaic, and more reader-friendly writing style accessible to a wider base of readership.
Over the past few years, there has been a significant amount of research on topics related to the legal discourse from an ESP perspective. While the study of Vass (2017) focuses on verb hedges in a one-million corpus of journal articles, supreme court agreements and supreme court disagreements, thus concluding that lexical verbs serving a hedging function are more pervasive in journal articles than in the other two genres, the research by Cheng and Cheng (2014) attempts to investigate epistemic modality in a corpus of civil cases in Hong Kong and Scotland, revealing no differences between the two legal systems with respect to the distribution of epistemic expressions serving to signal a degree of probability and possibility. In a survey of existing pedagogical resources relevant to legal education, Candlin, Bhatia, and Jensen (2002) conclude that the writing materials available for the students on how to approach legal prose are not fulfilling a clear pedagogical purpose, thus failing to meet the learner's writing needs, ignoring advances in linguistics theory and practice and are mainly delivered in an inaccessible manner.
In a corpus-based attempt to draw a line between disciplines, Durrant (2017) maintain that law is closely aligned with history, politics and English, as the distribution of patterns show that they share a great number of similar lexical bundles. Law, however, uses a rather distinctive set of recurrent patterns when compared with other disciplines such as physics, food sciences and chemistry.

Methodology
In this section, I will outline the corpus upon which this study draws. A discussion of the bundle selection and refinement will follow suit, focusing primarily on the criteria which have been applied while extracting bundles from the corpus and the measures taken to refine the set of bundles resulting from the corpus analysis.

Study Corpus
A study corpus is created to elicit lexical bundles meeting predetermined frequency and distribution parameters outlined in Bundles Selection Criteria and Refinement Section below. Texts making up the corpus are pooled from a variety of contract law subtopics, such as mistakes in contract law, theory of contract law, the modern law of contract and Chinese contract law (see Appendix A for a full list of books). Sections removed prior to corpus treatment include the publication information, copy rights violations warnings, acknowledgements, appendices, references, footnotes, endnotes, and tables of figures, cases, and statutes. Although there is no way to ascertain the language background of authors, the fact that the text is published by a key publisher attests to these authors' expertise and scholarship. Table 1 gives a comprehensive description of the corpus used in this study.

Bundle Selection Criteria and Refinement
Although the criteria for selecting bundles from a corpus of naturally-occurring language differ from one study to the other, there seems to be a general consensus among researchers that a target lexical bundle should contain a specific number of words, recur beyond a particular frequency threshold and should also appear across a predetermined number of texts making up corpus under scrutiny. Given the exploratory nature of this study, the length of the bundle, its frequency of occurrence and its distribution across the corpus subparts will determine the process of locating and extracting bundles. Another step to distill data will follow, thus removing overlapped and subsumed bundles. The Cluster Function in the software program Wordsmith Tools (Scott, 2016) is used to synthesize four-word bundles from the corpus and the Concordance Function is also employed to retrieve concordance lines needed to determine the meanings as well as the functions of selected bundles.
As for the length of the bundle, it is common practice in previous research to focus on four-word bundles, as three-word bundles are unmanageably greater in number and are sometimes embedded in four-word bundles. Lexical bundles of greater length, such as five-, six-and seven-word bundles, do exist but the rarity by which they occur makes them of little interest to researchers (Cortes, 2013;Esfandiari & Barbary, 2017). With respect to the frequency of occurrence, bundles are selected if they occur 40 times per million words, a normalized score corresponding to a raw frequency of 133. This conservative threshold (see, Esfandiari & Barbary, 2017;Pan et al., 2016) is to ensure that only bundles which recur frequently are selected for the analysis. The total number of bundles meeting the frequency criteria amounts to 150, all of which were copied into an excel sheet for further distilling of the data. The third step involves removing bundles occurring in at least five texts (25% of texts in the corpus). The impetus behind using such a specific minimum range score is to avoid patterns that are idiosyncratically typical of a text or author and since this study draws on a limited set of full-length texts, it is methodologically appropriate to include for analysis the types of bundles with greater tendency to occur across a range of such texts. A total of six bundles occurring in less than 25% of the texts are removed, thus reducing the overall number of bundles to 144.
By looking at the list of bundles resulting from applying the sequence and range criteria, it becomes clear that ijel.ccsenet.org International Journal of English Linguistics Vol. 9, No. 2;2019 there is much overlapping between bundles. Chen and Baker (2010) identified two types of overlapping: complete overlapping and complete subsumption. The bundles principles of international commercial and of international commercial contracts are two parts of the extended bundle principles of international commercial contracts. The two bundles share the same frequency and dispersion profiles. Both bundles are combined in a single string with the word contracts enclosed into two brackets: principles of international commercial +(contracts). Complete subsumption occurs when "two or more 4-word bundles overlap and the occurrences of one of the bundles subsume those of the other overlapping bundle" (Chen & Baker, 2010, p. 33). Examples include patterns such as to the terms of and the terms of the which are similar except in two lexical items.
Another procedure involves removing bundles which refer to specific judiciary entities such as the British House of Lords and the supreme court of a particular state (e.g., Supreme Court of Michigan). Three such bundles are eliminated because they are extremely context-dependent (Chen & Baker, 2010). Bundles removed due to overlapping and context-dependency amount to 27, thus minimizing the number of bundles to 117.

Results
A general overview of items on the final list (see Appendix B) reveals some interesting aspects of the legal vocabulary characteristic of the Contract Law. Lexical expressions co-occurring with the word contract are unsurprisingly dominating the list, thus reflecting the topic-specific nature of this register. The recurrent use of the sequences such as for breach of contract, in breach of contract and the breach of contract mirrors a serious concern among the legal community of a possible failure from one or both parties to maintain the binding nature of contractual agreements. Other patterns co-occurring with the term contract discuss what constitutes a contract as a legal document: term(s) of the contract, terms in consumer contract, contents of the contract and matter of the contract. Another interesting pattern emerging from the data concerns the use of lexical bundles which transcend register boundary, thus occurring in distinct contexts. Expressions such as in the case of, on the other hand, in the context of, in respect of the, on the basis of do not seem to be tied to a specific register. In the following two sections, the structural forms of bundles as well as their discourse functions will be discussed with examples taken from the corpus.

Structural Patterns of Lexical Bundles
One objective of the current study is to account for the grammatical structures of lexical patterns emerging from the corpus analysis. Drawing on the framework developed by Biber et al. (1999), lexical bundles can be broadly classified into noun-based, preposition-based and verb-based groups, each of which can be further classified into subgroups (see Table 2).
Noun-based bundles fall into two groups: a noun phrase followed by an embedded of-phrase fragment or a noun phrase which takes other post-modifier fragments. There are thirty-five bundles beginning with a noun or noun phrase followed by a post-modifying of-phrase fragment. A second noun-based subcategory of bundles involves the use of a noun phrase with either a post-nominal clause fragment (e.g., the fact that the, the way in which) or a prepositional phrase fragment (e.g., party to the contract, remedies for breach). Yet a third noun-based subcategory consists of a noun head premodified by nouns, adjectives or both (e.g., the parole evidence rule, the unfair contract terms). The bundle the contract and the is the final pattern which does not seem to belong to any of the subgroups outlined above and is thus considered a fragment. The second major category of lexical bundles in the collection of texts on Contract Law contains forty-three preposition-headed lexical bundles, nearly half of which take an of-phrase fragment as a post-modifier (e.g., for breach of contract, at the time of, in the case of). Yet a third major structural group consists of lexical bundles comprising a verb component. Three verb-based bundles begin with anticipatory It followed by copular verb and then either an adjective (e.g., it is clear that, it is In some cases where functional boundaries blur, an inductive approach (Biber & Barbieri, 2007) is pursued, thus relying on the concordance lines in order to determine the function served by the target lexical bundle.

Research-Oriented Bundles
As can be seen in Figure 2, bundles serving a participant-oriented function can be divided into sub-groups, each of which contains a number of distinct recurrent expressions. The greatest number of bundles are found in the topic-based category, whereas the smallest range of bundles occur in the description-based category.

Time, Entity and Agent Markers
According to Hyland (2008b), research-oriented bundles "help writers to structure their activities and experiences of the real world" (p.13). Within this category, bundles can be used to mark time, place or entity. Reference markers alluding to time include two patterns: at the time of and in the course of. Bundles referring to a particular judiciary entity are represented by five bundles: the court of appeal, the house of lords, by the house of, by the court of and of the court of. The widest range of bundles in this sub-category are found to refer to agents. Examples include patterns such as one of the parties, party to the contract, the other party to and the parties to the. Here are examples from the data representing bundles serving to refer to time, entity and agent.
• "The contract was illegal at the time of its formation." (time marker) • "The Court of Appeal held that the creditor was bound to be consistent." (entity reference marker) • "It is possible for either both or only one of the parties to intend illegal performance." (agent marker)

Procedure
Several bundles in the list are found to help account for a specific procedure such as the ruling of a court or the intention of parties to enter into a contractual agreement. These include patterns such as it was held that, the court held that and to create legal relations.
• "It was held that it was unreasonable for the defendant to exclude liability for breach of both express and implied terms." • "The status of 'intent to create legal relations' has become disputed."

Description
The third research-oriented sub-category includes bundles used for describing a particular law-related action or legislation.
• "It would perhaps have been different had the purpose of the hire been specifically advertised in these terms." • "The mistake about the application of the Rent Acts was not a ground for declaring the lease void."

Intangible Framing Attributes
Some bundles within the research-oriented group tend to highlight the real or abstract nature of an entity. Bundles such as the nature of the, the way in which and the value of the help to exemplify the characteristics and qualities of a specific entity: • "The lawyer must explain the nature of the transaction." • "The repairs would have cost twice the value of the ship."

Topic-Oriented Bundles
The largest group of bundles are domain-specific, that is, they are used to convey meanings typical of the contract law. Most of these domain-specific bundles revolve around the word contract; for breach of contract, terms of the contract, the law of contracts, performance of the contract and matter of the contract. A second set of domain-specific bundles serve to highlight some legislations such as the statute of frauds, of the civil code, the uniform commercial code, the parole evidence law.
• "The terms of the contract stated that the contract could be performed by the use of either of two named vessels." • "The law of contract is fundamental to any legal study." ijel.ccsenet.   documents and judgments".

Text-
The third, and by far the smallest, functional group consists of participant-oriented bundles which can be further divided into stance and engagement markers, representing approximately 10% of all expressions unveiled in this study. The limited number of stance and engagement patterns in the list seems to give further credence to Bhatia's observation that legal language is "highly impersonal and decontextualized, in the sense that its illocutionary force holds independently of whoever is the 'speaker' (originator) or the 'hearer' (reader) of the document" (Bhatia, 1993, p. 188). The paucity of participant-oriented bundles can also be interpreted from a genre perspective, as written texts involve minimal interaction between the author and the reader (Biber & Conrad, 2009).

Implications
This study has key methodological and pedagogical implications. On a methodological level, future researchers will find the analytical frameworks adopted her easy to emulate while designing studies of similar goals. The steps for corpus compilation, extraction and refinement are thickly described in a way that allows for easier replication. It is also possible that items detailed here may be compared against similar ones elicited from texts of another disciplines (e.g., history, English) or texts of a similar sub-disciplines (e.g., common law, labor law). Studies as such are expected to deepen our understanding of the rhetorical practices shaping arguments in distinct as well as similar disciplines.
Pedagogically, this study has two important implications. Although the purpose of the current study is not to generate a definitive list of bundles in the contract law, it is hoped that language instructors, materials authors and textbooks compilers find some patterns in the list of greater benefit to their ESP/EAP students. In a short classroom activity, for example, students can be asked to examine the language of a legal contract with the help of the recurrent items in the list in order to determine how these items are functionally used to serve key communicative purposes. Another pedagogical implication is that instructors can draw on the corpus-derived examples outlined in the Findings Section while explaining the meanings as well as the functions of patterns in the list. In this case, learners not only have the opportunity to experience patterns as they occur in real contexts, but also can identify the different senses conveyed by each pattern based on real examples.

Conclusion
In conclusion, the role played by language in various academic setting is indisputably great, as is neatly encapsulated by Hyland who maintains that "educating students, demonstrating learning, disseminating ideas and constructing knowledge rely on language" (Hyland, 2009, p. 1).
The research reported here is an attempt to explore contract law, a key subdivision of the legal register, with the aim of unveiling recurrent multiword patterns. These patterns are then subjected to structural and functional analyses based on approaches and frameworks from corpus linguistics and genre studies. It is hoped that the findings as well as the discussion of these findings will increase our knowledge of the legal discourse in general and the law of contracts in particular.