A Corpus-based Comparative Study of Learn and Acquire

As an important yet intricate linguistic feature in English language, synonymy poses a great challenge for second language learners. Using the 100 million-word British National Corpus (BNC) as data and the software Sketch Engine (SkE) as an analyzing tool, this article compares the usage of learn and acquire used in natural discourse by conducting the analysis of concordance, collocation, word sketches and sketch difference. The results show that different functions of SkE can make different contributions to the discrimination of learn and acquire. Pedagogical implications are discussed when the results are introduced into the classroom.


Introduction
One day, a student asked me: "The verbs learn and acquire share the following similar meaning: to develop or gain knowledge and skill.Why do we say acquire knowledge instead of learn knowledge, and why do we say learn to drive instead of acquire to drive?"I answered: "Learn and acquire are synonyms.They share similar meanings and usages, but they also differ in collocational and colligational patterns."The student continued to ask: "What are the collocational and colligational patterns of learn and acquire respectively?"Being a second language learner myself, I found it hard to give her a satisfactory answer.Therefore, I went to the library and tried to find the answer from reference books.In Merriam Webster's Dictionary of Synonyms, learn and acquire are not classified as synonyms.Longman Synonym Dictionary lists the synonyms of learn and acquire respectively, but offers no further explanation as to the similarities and differences between the two verbs.Unable to find the satisfactory answer, I decided to conduct a corpus-based comparative study of learn and acquire to address the perplexing question.
The paper is structured as follows.Section two gives an overview of related work by introducing corpus studies of collocation and colligation, and their relevance to the study of synonyms.Section 3 introduces corpus data and tools used in this study.The results of this study are presented and analyzed in Section 4, where I show the success of Sketch Engine in researching synonyms.The final section summarizes major findings and pedagogical implications of this study.

Corpus Studies of Collocation and Colligation
Collocations are pervasive in texts of all genres and domains.Although the study of collocations can be traced back to ancient Greece, the notion of collocations was first brought up by Palmer (1933) in English language teaching and later introduced to the field of theoretical linguistics by Firth (1957).The often-cited definition of collocations is "statements of the habitual and customary places of that word" (Firth, 1957, p. 181).
Nevertheless, Firth's research on collocation is largely intuition-based, which is in sharp contrast with most corpus linguists' belief that the only way to reliably identify the collocates of a given word is to study patterns of co-occurrence in a corpus.For example, Hunston (2002, p. 68) argues, "Collocation may be observed informally in any instance of language, but it is more reliable to measure it statistically, and for this a corpus is essential."The idea that Firth proposed is operationalized by Sinclair and associates' early work from 1970s and later collocation becomes one of a few most important concepts in corpus linguistics.Collocations impose a great challenge for second language learners.Numerous studies indicate that learners' language is problematic in the idiomatic usage of English, which can be mainly attributed to misrepresented collocations.
A collocation is a co-occurrence pattern that exists between two items that frequently occur in proximity to one another, but not necessarily adjacently or, indeed, in any fixed order.Closely related to collocation is the notion of node and collocates.A node is an item whose total pattern of co-occurrence with other words is under examination; and a collocate is any one of the items which appears with the node within a specified span (Sinclair et al., 2004, p. 10).Collocates are also determined within particular spans: "Two other terms ... are span and span position.In order that these may be defined, imagine that there exists a text with types A and B contained in it.Now, treating A as the node, suppose B occurs as the next token after A somewhere in the text.Then we call B a collocate at span position +1.If it occurs as the next but one token after A, it is a collocate at span position +2, and so on."(Sinclair et al., 2004, p. 34) In order to test whether two words are significant collocates, four pieces of data are required: the length of the text in which the words appear, the number of times they both appear in the text, and the number of times they occur together (Sinclair et al., 2004, p. 28).Building on Sinclair's work, Hoey (2005, p. 5) defines collocation as "a psychological association between words (rather than lemmas) up to four words apart and is evidenced by their occurrence together in corpora more often than is explicable in terms of random distribution".
The notion of colligation is closely related to that of collocation.The term colligation was introduced by Firth (1968, p. 181) in order to distinguish lexical interrelations from those holding between grammatical categories: The statement of meaning at the grammatical level is in terms of word and sentence classes or of similar categories and of the interrelation of those categories in colligations.Grammatical relations should not be regarded as relations between words as such -between watched and him in 'I watched him' -but between a personal pronoun, first person singular nominative, the past tense.
Hoey provides a straightforward definition: colligation can be defined as 'the grammatical company a word keeps and the positions it prefers'; in other words, a word's colligations describe what it typically does grammatically' (Hoey, 2005).Thus, colligation is a similar idea to collocation, but with a different emphasis.For example, 'verb + to infinitive' is a colligation, while dread + think is a collocation which exemplifies the colligation.Irrespective of the definition adopted, colligation, like collocation, is a probabilistic relation.

Corpus Approaches to Synonyms
Synonymy, or semantic equivalence, is an important yet intricate linguistic feature in the field of lexical semantics.Synonyms are not completely interchangeable; rather, they differ in shades of meaning and vary in their connotations, implications, and register (DiMarco et al., 1993).Any natural language consists of a considerable number of synonymous words.English is particular rich in synonyms due to historical reasons, which enables English speakers "to convey meanings more precisely and effectively for the right audience and context" (Liu & Espino, 2012, p. 198), but also constitute a thorny area for EFL (English as Foreign Language) learners because of their subtle nuances and variations in meaning and usage.
It thus comes no surprise that an important aspect of English linguistics is to find the proper measures of automatically identifying and extracting synonyms (Peirsman, Geeraerts & Speelman, 2015) and of distinguishing one word from its synonyms or near-synonyms (Hanks, 1996;Biber et al., 1998;Gries, 2001;Xiao & McEnery, 2006;Divjak, 2006;Gries & Otani, 2010;Liu, 2010;Hu & Yang, 2015).Although the two orientations of researching synonyms are equally important, I will in this paper focus more attention on the second one.I would like to discover what the relative strengths and weaknesses of using Sketch Engine to research synonyms are, and what their relative scope of applicability is.
The past decades have witnessed significant advances in the studies on synonymy, which was boosted by the advent of the computer era and the central ideas of corpus semantics.Based on the Brown Corpus, Miller & Charles (1991) find that the more two words are judged to be substitutable in the same linguistic context (i.e. the same location in a sentence), the more synonymous they are in meaning.Church et al. (1994) employ a "lexical substitutability" test in a corpus study of the near-synonyms ask for, request and demand, which produced the same finding: the substitutability of lexical items in the same linguistic context constitutes a good indicator of their semantic similarity.Gries (2001) quantifies the similarity between English adjectives ending in -ic or -ical (like economic and economical) on the basis of the overlap between their collocations.Gilquin (2003) investigates the difference between the English causative verbs get and have, Glynn (2007) compares intra-and extralinguistic factors in the contexts of hassle, bother and annoy.Gries and Otani (2010) studies the synonyms big, great and large and their antonyms little, small and tiny.Other sets of synonyms that have attracted attention include strong and powerful (Church et al., 1991), absolutely, completely and entirely (Partington, 1998), big, large and great (Biber et al., 1998), quake and quiver (Atkins & Levin, 1995), principal, primary, chief, main and major (Liu, 2010), and actually, genuinely, really, and truly (Liu & Espino, 2012)

Corpus Data: BNC
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, which is designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written (Aston & Burnard, 1998).The written part of the BNC (90%) includes extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text.The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins.
BNC is monolingual, synchronic, general and sample-based by nature.It deals with modern British English, covers British English of the late twentieth century, includes many different styles and varieties instead of being limited to any particular subject field, genre or register, and that it contains many samples which allows for a wider coverage of texts within the 100 million limit.The corpus is encoded according to the Guidelines of the Text Encoding Initiative (TEI) to represent both the output from CLAWS (automatic part-of-speech tagger) and a variety of other structural properties of texts (e.g.headings, paragraphs, lists etc.).Full classification, contextual and bibliographic information is also included with each text in the form of a TEI-conformant header.

Corpus Tool and Analysis Procedure
The Sketch Engine (SkE) is a leading corpus tool, widely used in lexicography, language teaching, translation and the like (Kilgarriff et al., 2004(Kilgarriff et al., , 2014)).It includes two different things: the software, and the web service.The web service includes, as well as the core software, a large number of corpora pre-loaded and 'ready for use', and tools for creating, installing and managing users' own corpora.Corpora in SkE are often annotated with additional linguistic information, the most common being part of speech information (for example, whether something is a noun or a verb), which allows large-scale grammatical analyses to be carried out.
SkE has a number of core functions: Thesaurus, Wordlist, Concordance, Collocation, word sketches, and Sketch Diff.We are going to use Concordance, Collocation, word sketches and Sketch Diff functions in the present study.The span (the number of words left and right of the search word) is (-5, 5), the minimum frequency of each collocate being set 10 and minimum frequency in given range (in our case -5, 5) 5. Of seven measures to calculate the strength of collocation (T-score, MI, MI3, log likelihood, min.sensitivity, and logDice), I choose the default one logDice which is considered more reliable than the frequently used MI (mutual information) measure.

The Frequencies of Learn and Acquire
Concordance enables researchers to compare frequencies of synonymous words.As shown in Table 1, the frequency of learn is nearly 3 times of acquire.Other collocates such as about, from, through, how and that have much to do with the grammatical relation which will be analyzed in the next section.Besides, pronoun, adverb and adjective collocates are also salient.Of 50 collocates there are 3 indefinite pronouns: lot, more and much; 2 adverbs: quickly and to; 2 adjectives: new and hard.As shown in Table 5, the dominant left collocates of acquire can be grouped into two categories:  Abstract nouns: skill(s), knowledge, reputation, asset(s), title, status, qualifications, expertise, understanding, taste, infection, rights, syndrome, significance, language, information, citizenship, ownership, competence, deficiency, meaning, interest, habit, momentum, power, wealth  Individual/collective nouns: shares, stake, property, premises, properties, software, company, goods, weapons, collection In addition to the above categories, adjective, proper noun and preposition collocates are also quite salient.Of the 50 collocates there are 4 adjectives: additional, sufficient, immune and necessary; proper nouns: Target, Newco, Inc and Museum; 2 prepositions: during and through.

The Syntactic Patterns of Learn and Acquire
The syntactic patterns of the two verbs are based on the Word Sketch function of SkE.In order to present a fine-grained comparison, I summarized the 18 patterns of learn and 14 patterns of acquire in Table 6 and Table 7.In the first example of Table 6, the underlined word beginners functions as the subject of learn.It has to be noted that although the syntactic patterns of the two verbs are similar in many ways, there also exist apparent differences, which can be easily shown when using Sketch-Diff function of SkE.

Direct Comparison of Lexical and Grammatical Collocates
The Sketch-Diff function of SkE allows users to visually compare and contrast synonymous words according to their salient collocational context.Figure 1 is part of the result when clicking 'Show Diff'.In the figure, the greener a word is, the more closely it relates to learn.The redder a word is, the more closely it relates to acquire.For example, it is more usual to say pupil learns than buyer acquires.Apparently, despite that the two verbs learn and acquire share a number of syntactical patterns, the collocates in each pattern differ considerably.In the 'and/or' pattern, the collocation tokens for learn are 374 and 150 for acquire, indicating that there are more words used in 'and/or' pattern with learn.Many words frequently collocates with learn but never used with acquire (such as listen, practice, teach, learn, explore, grow, understand, remember, live, read, watch, try, work, think, and go).On the other hand, sell occurs 6 times with acquire, but there is no occurrence of sell with learn.
In the 'modifier' pattern, the collocation tokens for learn are 2664 and 955 for acquire, indicating that there are more words used as modifier of learn.Some words only collocate with learn (such as deal, lot, fast, enough, much, readily, ago and ever), some only collocate with acquire (such as somehow, thereby, subsequently and illegally), some others can be used both with learn and acquire (such as never, quickly, later, soon, yet, early, gradually, rapidly, thus and recently).As is shown in Fig 1, some words are more likely to collocate with learn than with acquire (such as never, quickly, later, soon, yet, early and gradually), while some others are more likely to collocate with acquire than with learn (rapidly, thus and recently).
In the 'object' pattern, the collocation tokens for learn are 4989 and 4945 for acquire, indicating that there are similar numbers of words used as objects of learn and acquire.Some words only collocate with learn (such as Friend, lesson, disability, craft, lot, trick, outcome, English, basics, French, secret and rope), some only collocate with acquire (such as title, stake, land, volatile, qualification, share, infection, asset, knowledge and reputation), some others can be used both with learn and acquire (such as language, technique and skill).As is shown in Fig 1, some words are more likely to collocate with learn than with acquire (such as never, quickly, later, soon, yet, early and gradually), while some others are more likely to collocate with acquire than with learn (rapidly, thus and recently).Something worth noting is that the subjects of learn is more about knowledge or skill (such as lesson, craft, trick, English, basics, French, rope, language, technique and skill), while the subjects of acquire are more about qualification or asset (such as title, qualification, reputation, stake, land, share and asset).

Conclusion
In view of its importance and intricacy, researching synonymy is a crucial task in the field of lexical semantics.This paper has introduced the leading corpus tool SkE and its advantages in investigating synonymous verbs.
The results show that different functions of SkE can make different contributions to the discrimination of learn and acquire.
This study has a number of pedagogical implications.First, studies in second language acquisition show that native-speakers memorize not only words in isolation, but also chunks of words.These chunks are viewed as the building blocks of language.They are available to speakers as ready-made or prefabricated units, and therefore contribute to conferring fluency and naturalness of their utterances.Thus, if EFL teachers aim to help their students to achieve a great amount of fluency and accuracy, they may hope students to use the collocational and colligational patterns from Table 1 to 7. Second, there exist a huge amount of synonyms in English, therefore it would be unlikely for teachers to teach each pair of them to students.It might be more promising to teach students how to use SkE to conduct their own research.

Figure 1 .
Figure 1.Comparison of learn and acquire in terms of collocational patterns

Table 1 .
Frequency of learn and acquire in BNC (per million)Table 2 and Table 3 list the top 50 left and right collocates of learn automatically generated by the software.Table 4 and table 5 list the top 50 left and right collocates of acquire automatically generated by the software.

Table 2 .
The top 50 left collocates of learn in BNC Personal pronouns: we, you, they, I, my, she  Auxiliary and modal verbs: 've, have, had, will, 'd, has, having, must, can, need, should In addition to the above categories, pronoun, adverb and adjective collocates are also quite salient.Of the 50 collocates there are four adverbs: soon, quickly, never and to; 4 interrogative and indefinite pronouns: what, how, lot and much; two adjective: young and surprised.Besides, collocates such as hon.and right appear in the phrase (as) my right hon.
As shown in Table2, the dominant left collocates of learn can be grouped into four categories:  Abstract nouns: lesson(s), opportunity, teaching, skills, language, experience, thing(s), learning, surprise  Individual/collective nouns: children, child, pupils, student(s), people 

Table 3 .
The top 50 right collocates of learn in BNC

Table 4 .
The top 50 left collocates of acquire in BNC In addition to the above categories, proper noun, notional verb, auxiliary verb and material noun collocates are also quite salient.Of 50 collocates there are 6 proper nouns: Newco, Inc, AIDS, Group, Museum and Corp; 4 notional verbs: managed, enable, able and prevent; 2 auxiliary verbs: has and had; 2 material nouns: land and volatiles.

Table 5 .
The top 50 right collocates of acquire in BNC

Table 6 .
The syntactic behavior of learn in BNC