An AWE-Based Diagnosis of L2 English Learners’ Written Errors

While Automated Writing Evaluation (AWE) can perform an error diagnosis (Chen & Cheng, 2008), previous studies used to exclude it from the process of error analysis. This study aimed to examine the reactions of Grammarly Premium towards a group of night school students’ English writings at a Taiwanese technical university. The participants of the research produced 175 essays. The researcher checked the data against the AWE program. 1042 errors were detected and classified into 40 types. The 40 types of errors were at three hierarchical levels: a word and phrase level, a sentence level, and a discourse level. This study suggested future studies to view AWE’s functions in a new perspective and find it a space in the process of error diagnosis.


Error Analysis
As one of the most critical theories in the field of second language acquisition, error analysis scrutinizes the types, causes, evaluation, and correction of learners' errors (Bussmann, 2006). Brown (2000) defined it as "the study of learners' ill-formed production (spoken or written) in an effort to discover systematicity." (p. 324) Sobahle (1986) indicated that it focuses on the errors made by a group of L2 English learners who have the same first language. Furthermore, error analysis can be conducted stage by stage. Corder (1982) divided the process of error analysis into three stages. In the first stage, researchers need to describe and identify learners' errors. Mistakes have to be excluded at the outset because Corder (1967) stated that "mistakes are of no significance to the process of error analysis (p. 167). Mistakes are performance errors resulted from a slip of the tongue, while errors are systematically made by learners and reflections of their transitional competence (Brown, 2000). Learners' sentences are hypothesized as ungrammatical before they are proven otherwise. Note that overt and covert errors are required to be differentiated from the beginning. Overt errors are not grammatical at the sentence level, while covert ones are grammatical at the sentence level but ungrammatical in context or at the discourse level (Corder, 1982).
In the second stage, reconstructed sentences, English native speakers' versions of learners' idiosyncratic dialects, are made to compare with the mistaken ones. If researchers have plausible explanations for errors, they need to come up with reconstructed sentences or, in Corder's term (1982), translation equivalents for them. If not, they might have to refer to learners' mother tongues and do a literal translation for them. They translate mother-tongue sentences back to well-form target language sentences, comparing them with learners' ill-formed versions. However, if researchers do not know learners' mother tongues, the sentences in question are put aside until they know more about learners' idiosyncratic dialects. Corder (1982) suggested that "the third stage and ultimate object of error analysis is explanation" (p. 24). In the final stage, researchers need to provide an interpretation of learners' errors. Unlike contrastive analysis, error analysis includes other sources of errors such as intralingual transfer, the context of learning, communication strategies, word coinage, false cognate, and prefabricated patterns, except for interference from the mother tongue (Brown, 2000).

Automated Writing Evaluation
A writing teacher in Asia is usually faced with more than fifty students and might be reluctant to give them more assignments because reading and grading their writings are time-consuming (Wang, 2013). Automated writing evaluation, namely AWE, is computer software that provides scores and feedbacks for essays (Stevenson, 2016). As noted by Chen and Cheng (2008), AWE was initially developed in the 1960s to reduce the workloads and time spent on grading students' writings. Thanks to the advancement of artificial intelligence technology, it has been improving in terms of program design since the mid-1990s.
While evaluating learners' compositions might be time-consuming, Chen, Chiu, and Liao (2009) indicated that "with the help of the help of these new computer programs, writing teachers can reduce the time spent correcting and commenting on students' compositions" (p. 3). Besides, AWE increases more writing practices and assist writing instruction (Roscoe et al., 2017). As pointed out by Vojak, Kline, Cope, McCarthey, & Kalantzis (2011), AWE provides not only comprehensive summative scores but also feedbacks that are beyond mechanic levels and contain writing components such as content and development, organization, and so on.
While Automotive Writing Evaluation might be helpful to students' writing, its use in writing classes is still a controversial issue. As indicated by Stevenson (2016), its scoring correctness, feedback capabilities, and being a non-human audience remain to be improved. Furthermore, AWE does not recognize writing as an activity that is social, contextual, and multi-modal. In the same vein, Conference on College Composition and Communication (CCCC), 2004, questioned the validity of AWE because of its being a non-human audience. CCCC argued that "writing-to-a-machine" diminished the importance of human communication, which is the purpose of writing.
Previous studies on AWE are related to its primary functions that are to provide feedback and scores for learners' compositions. As shown by Stevenson (2016), the majority of AWE studies examined whether AWE feedback could improve learners' writings. The researches might investigate its effects on learners' written products and writing processes and their perception of AWE usefulness (Zhang & Hyland, 2018). Also, different AWE systems are compared in terms of the quality of feedback (Chen et al., 2009). In reviewing studies on AWE scores, Li, Link, and Hegelheimer (2015) classified the studies of this nature into three strands: correlational studies between AWE's ratings and human raters', instructors' perceptions of AWE scores, and learners' perceptions of AWE scores. While AWE can perform an error diagnosis (Chen & Cheng, 2008), this function seemed to be minor because few studies employed AWE to focus on an error analysis of learners' writings.
Grammarly is one of the available AWE systems currently. Its premium version claims to check grammatical and spelling errors, evaluate writings' correctness and readability and offer vocabulary-enhancement suggestions. Besides, it could go beyond the sentence level to check genre-specific styles, and even has a built-in plagiarism detector (Grammarly, 2019). However, Nova, (2018) pointed out that "there is a limited number of studies yet taking its claim into account and evaluating the process of evaluation given by Grammarly program" (p. 82). More recently, comparing Grammarly Premium with human raters, Park (2019) suggested that future studies should include other writing genres except for argumentative essays.

Research Questions
Due to the limited studies on the reactions of Grammarly Premium to L2 English writers' errors, the specific research questions to be addressed are as follow: 1. What types of errors does Grammarly Premium identify from the participants' writings?
2. To what hierarchical levels do the participants' errors belong?

Participants
71 students participated in the study. The participants came from two English writing classes, Applied English Writing One and Applied English Writing Two, offered at the night school program of a technical college in Taiwan. They were all majors of Applied Foreign Languages. 38 students enrolled in the class, Applied English Writing One, and 33 students in the course, Applied English Writing Two. Their proficiency levels ranged from the beginning to the low intermediate level. Table 1 illustrates the topics and the number of the writing assignments. As indicated by Table 1, the total number of assignments was 8. While the students did four writing assignments for both classes, the total number of collected articles were only 175. 5 out of the 8 topics were the written structure of description: My Favorite Movie, My Best Vacation, Free Writing: Please Write a Descriptive Paragraph, My Family, and an email: A Personal Description.

Data Analysis Procedure
The participants of the study produced 175 articles. After collecting the compositions, I pasted them into Grammarly Premium, checking them against it. Finally, I recorded the detected errors and utilized the Statistical Package for the Social Science (SPSS/PC, version 22.0) to calculate descriptive statistics for the data set.

Results
Table 2 provided the answer to the first research questions. As indicated by the table, Grammarly Premium detected 40 types of errors. The total number of errors committed by the participants was 1042. The top ten high-frequency errors were: spacing, punctuation, spelling, articles, vocabulary, verb form and tense, repeated words, prepositions, sentence fragments, and wordiness.
To answer the second research question, I further categorized the participants' written errors into three levels: a word/phrase level, a sentence level, and a discourse level. 19 types of the errors fell into the word and phrase level, including spelling, articles, vocabulary, prepositions, pronouns, conjunctions, dialect-specific spelling, infinitives, number in nouns, comparatives, gerunds, demonstratives, adverbs, quantifiers, determiners, collocation, squinting modifiers, qualifiers, and auxiliary verbs. Besides, 12 types of the errors belonged to the sentence level: spacing, punctuation, verb forms and tense, sentence fragments, subject-verb agreement, capitalization, voice, wrong word order, comma splices, missing verbs, article-noun agreement, and missing subjects. Finally, 9 types of errors were at the discourse level: repeated words, wordiness, monotonous passages, unclear antecedents, fluency, tautology, confidence, formality, and too long a paragraph. As defined by Suri and McCoy (1993), discourse-level errors influence the comprehension of the following texts and need to be identified with the previous ones. The following are the examples of the high-frequency errors of the participants. In the running text, I will use italics to highlight their errors.

Incorrect Spacing
The maximum errors observed in the study came from the realm of improper spacing with punctuation, a type of mechanical error. It is the most common grammatical error in the participants' writing samples, which amounts to 210 and accounts for 20.15% of all the error tokens. As indicated by Hevny (2013), English written errors involve two aspects: grammatical and mechanical errors. Grammatical errors include incorrect use of sentence patterns, articles, singular and plural forms, and tenses. Mechanical errors refer to mistakes in spelling, punctuation, spacing, and capitalization. Note that although the incorrect spacing is a type of mechanical error, it falls into the sentence level.

Misuse of Punctuation
As indicated by Hirvela, Nussbaum, and Pierson (2012), punctuation receives little attention both in the classroom and in the academic arena. As a result, students have trouble punctuating in English because of the insufficient amount of time teachers spent time instructing them. The misuse of punctuation is the second most frequent error in the participants' writings. 128 out of 1042 error tokens are this type of mechanical error. In some cases, the participants missed a comma before the coordinating conjunction and in a compound sentence. In other cases, they missed commas with interrupters. Examples with italicized errors are below.
a. My father is a civil official and my mother is a house wife.
b. I love my quiet time, for example reading magazines, watching TV.
c. It is not good for her healthy so I adopted a stray dog, Luby, five years ago.
d. I guess he was not like this but after his life changed for some reason to be a stray dog, he realized life is not elt.ccsenet.org English Language Teaching Vol. 13, No. 10;2020 easy so he turned so fake to survival.
e. She is a leader in her company but at home she is a lazy woman.
f. She does not do anything at home and she calls us to do it.

Examples of Spelling Errors
Spelling errors are the third most frequent error in the participants' writings. 104 out of 1042 error tokens are this type of mechanical error. The participants' spelling errors include misspelled words, commonly confused words, and unknown words. Misspelled words are italics in the following examples.
a. A place I would like to vist one dayGreece is the place which I wnat to vist one day.
b. Although now I haven't tje time to go see those beautiful place because the children c. It has many culture, salient features, delicous snacks and even the weather.
d. No matter which place we are, we can also taste the diffent snacks and local products in Taiwan.
e. My dad Leo, he forety year old.
f. My mom Alice, she threety-eight year old.

Examples of Errors in the Use of Articles
The participants' misuse of English articles includes missing articles and incorrect article use. In some cases, they utilized wrong articles in set expressions. 92 out of 1042 (8.9%) error tokens fall to this category. Chinese L2 English learners have difficulties learning English articles because Mandarin Chinese does not have the counterparts of the English article system (Robertson, 2000). As indicated by Fen-Chuan Lu (2001) c. We need to learn second language so I will choose English.
d. English is international language that should all of us need to study English.
e. I hope I can good at English in future.
f. In a distance, there is a beautiful hose by the water, so you can see the navy blue ocean every day.

Vocabulary Misuse
67 out of 1042 (6.43%) error tokens are an odd choice of words. The participants might use confusing or miswritten words that do not fit into contexts. Otherwise, their choice of words is considered weak, overused, overly complicated, or repetitive. Italicized words are misused vocabulary items in the following examples.
a. When we first sight at him, he was shy and had slowly stepped forward to us when others were energetic jumping and barking for drawing our attentions, we almost decided right away he is the one, because we live in the apartment, Mom is old cannot handle the dog who is over activity.
b. My father is very strong in my family. c.
[She] is very industrious.

d. [He] is the most powerful [lesson].
e. It is not good for her healthy so I adopted a stray dog, Luby, five years ago.
f. That was my awesome memory.

Ungrammatical Verb Form and Tense
Ungrammatical verb form and tense ranks the sixth among all the error tokens. 63 out of 1042 (6.05%) error tokens belong to this category. The grammatical mistakes drawn from this category may include inaccurate verb forms after modals, wrong verb forms, and incorrect verb forms with subjects.
As indicated by Sun (2014), "Chinese EFL learners are often found to struggle a lot with different usages of various English tenses" (p. 179). Likewise, Chou and Wu (2007)  f. [She] can chatting anything with anyone at anywhere.

Repeated Words
While repeated words may not be ungrammatical, the automated writing evaluation system, Grammarly Premium, identifies them as errors. As shown by its pop-up screen, it noted that "a written work that uses the same words over and over is less interesting for the reader than a work that uses a rich vocabulary." 55 out of 1042 (5.2%) error tokens fall to this category, ranking the seventh among all the error tokens. Repeated words are a type of discourse error that might be the result of language transfer (Bao, 2015). They are a cohesive device in Chinese but considered redundancy in English. Likewise, Yang (2014) stated that "Chinese tends to employ repetitions in a discourse, while English tends to use synonyms to replace the repeated word "(p. 121). Examples in which errors are italics are as follows.
a. I think my favorite type of movie is action. Why action is my favorite type of movie? I feel tension, stimulate and very excited when I see this kind of movie. I very enjoy the movie time, and I image that I am the actor in movie.
b. And he is also like to make many kinds of friends. Because he want to have different kinds of friends to chat with him and play with him together.
c. From far distance the ocean is a deep navy blue. At nights, you can only hear the waves of the ocean and the light wind which comes from the huge palm trees.

Other Top-Ten High-frequency Written Errors
Other top-ten high-frequency written errors include misuse of prepositions, sentence fragments, and wordiness. The participants were detected using confusing or redundant prepositions and writing incomplete and wordy sentences in their writing. Wordiness might be an interlingual error because Chinese L2 English writers prefer to employ lexical repetition to achieve lexical cohesion as shown in the last example (Jin, 2001). Examples in which errors are italics are as follows.
a. Although we have many trouble on our life, we can learn anything on those problem. f. This winter vacation is much happy! I must bear in mind well it, make a happy recollection!

Discussion and Conclusion
The AWE system in question, Grammarly Premium, detected 1042 errors from the participants' compositions, classifying them into 40 types. The 40 types of errors were at three hierarchical levels: a word and phrase level, a sentence level, and a discourse level. 19 types of errors fell into the word/phrase level, 12 types of them belonged to the sentence level, and 9 sorts of them were at the discourse level. Vojak et al. (2011) indicated that AWE programs "were generally unable to construct valid targeted feedback on text beyond the sentence level" (p. 103). However, unlike previous experiences, this study suggested that Grammarly Premium could identify errors beyond the sentence level.
The participants of the study produced a corpus of 175 essays, which were analyzed by Grammarly Premium. The 175 English essays seemed to be a relatively large corpus compared with previous studies of error analysis. For example, Sun (2014) provided an account of Chinese L2 English learners' written errors with a corpus of 30. Wu and Garza (2014) analyzed Taiwanese L2 English learners' writings with a corpus of 40. Likewise, Sawalmeh (2013) conducted a case study of Arabic L2 English learners' written errors with a corpus of 32. The time-consuming error analysis processes might impact the size of their written samples. Thus, in comparison with typical error analysis, conducting an error analysis with an AWE program may be advantageous in terms of speed when dealing with a large corpus of written samples.
Error analysis is a three-stage process that includes identification, reconstruction, and interpretation of learners' errors (Corder, 1982). Based on Corder's proposed procedure, Wu and Garza (2014) divided the process of error analysis into five steps: collection, identification, classification, quantification, and analysis of errors. As indicated by this study, Grammarly Premium could perform the first three steps of the procedure, although researchers may need to enumerate and analyze errors to complete the process. The analyzed data showed that interlingual errors might be a significant source of the participants' errors. For instance, five of the top-ranking errors, spacing, article, verb form and tense, repeated words, and wordiness, might fall into this category, taking 42.3 percent of the total errors. CCCC (2004) considered that the risks of AWE outweigh its benefits and that writers should write to a human reader, not a machine. Learners might indeed know that AWE is one of their readers when it grades and corrects their compositions. However, the functions of error analysis are to analyze learners' written and oral errors (Brown, 2000). When AWE helps researchers conduct an error analysis, it acts as a detached second analysist, rather than as a reader of students' writings. Error analysis conducted by human researchers alone is time-consuming, but AWE can identify learners' errors almost instantaneously. While AWE could reduce researchers' workload with error analysis, previous studies used to exclude it from the processes of error analysis. This study illustrates that AWE is a useful tool to diagnose L2 English writers' characteristics at different stages of writing processes.