Online Open-Source Writing Aid as a Pedagogical Tool

The present research offers an assessment of the online open source tools used in the L2 academic writing, teaching, and learning environment. As fairly little research has been conducted on how to best use online automated proofreaders for educational purposes, the objective of this study is to examine the potential of such online tools. Unlike most studies focusing on Automated Writing Evaluation (AWE), this research concentrates only on the online, open-source writing aide, grammar, spelling and writing style improvement tools available either for free or as paid versions. The accessibility and ability to check language mistakes in academic writings such as college-level essays in real time motivates both, teachers and students. The findings of this empirical-based study indicate that despite some bias, computerized feedback facilitates language learning, assists in improving the quality of writing, and increases student confidence and motivation. The current study can help with the understanding of students’ needs in writing, as well as in their perception of automated feedback.


Introduction
Writing in English as a second (or foreign) language (L2) has a significant place in academia.From this perspective, the quality of teaching writing skills, as well as evaluating students' drafts, is demanding.Moreover, research writing for graduate students is the foundation of their academic success and future career.Learning how to write and then, maybe, publish in reputable journals make them competitive in the job market (Cotos, 2014).
Essentially, assessment of writing assignments aims at supporting and improving student learning.Assessment, as a term, stems from a movement towards accountability.According to Peha (2011), assessment is collecting information with the purpose of guiding future instruction and it is closely related to evaluation.He defines evaluation as any decision that is made based on the information which has already been gathered through assessment.
The students' aim is to become better writers or at least to perform sufficiently well in their classes to accomplish the level of success they desire.They want to have their performance in writing evaluated consistently.As it was mentioned in Blikstad-Balas, Roe and Klette (2018) "…student development as writers requires a supportive environment in which they receive sustained opportunities to write" (p.119).Based on Beatty and Gerace study (2009, p. 152), within the conceptual changes in pedagogy, the main role of classroom discourse is to direct students' thinking and provide the environment to enhance thinking, cognitive skills and entire learning.Currently, two main scoring systems in writing are used for academic purposes: 1.1 AWE versus Human Rating AWE (or AEE -automated essay evaluation) is defined as "the process of evaluating and scoring written prose via computer programs" (Shermis & Burstein, 2013, p. 1).It has witnessed a growing interest in the field of L2 writing (Chen & Cheng, 2008;Warschauer & Ware, 2006) over the past years.Up to now, the hands-on benefits of automated feedback for teachers, such as timesaving and pedagogical effectiveness have been researched in specific instructional contexts.Automated feedback is the most promising point of contact between the areas of AWE and L2, which some studies recommend applying to complement teacher's comments (Shermis & Burstein, 2013;Warschauer, 2000).
The latest research on automated essay scoring conducted by Blood (2011) focuses on operationalizing and scoring writing assignments, as well as defining what is 'excellent writing'.Training, as well as marking, is a time-consuming process.Online tools shorten the period needed to obtain scores (Livingston, 2009).
In addition, maintaining the reliability and construct validity of scores, is challenging, as instructors or testers differ in their comprehension of candidate performances and their tendencies towards severity and leniency (Blood, 2011).Although machines do not understand and evaluate writings using the same cognitive abilities as humans, the concerns about AWE might be allayed as various systems have seemed to achieve similar scoring prediction outcomes (Shermis & Hamner, 2012).
Previous research has documented that employing well-trained human raters is not always feasible.Significantly, biased ratings usually appeared in cases where candidates showed very high or low ability of writing.In these cases, the raters were not consistent in their application of rating criteria.Kondo-Brown (2002) states that the rater training may improve inter-rater reliability and the consistency of ratings, although it may not have much influence on other unwanted rater characteristics, such as increased or decreased severity.
Likewise, Shi (2001) concludes although raters do not differ significantly in the scores they gave, they differed considerably in the justifications they provided for their ratings.Various raters weighted certain essay categories more heavily than others (Weigle, 1999).Native speakers are inclined to focus more on content and language, while non-native speakers focused on organization and length.Similarly, the research conducted by Wolfe, Song and Jiao, (2016, p. 1) foresee the difficulty in the accuracy of scoring an essay by experienced and inexperienced raters.

Benefits and Challenges of AWE
Even though the question of whether automatic essay scoring tools apply scoring rules accurately and consistently, it has not been a matter of great professional discussion because internal reliability is an inherent trait of such systems.Many critics of automatic essay scoring have not succeeded in contextualizing their criticism within "an interpretive argument for the validity of a specific testing context, thus relegating their objections to the realm of the abstract" (Blood, 2011, p. 53).For example, Ericsson and Haswell's (2006) anthology doubts the implementation of AWE in writing classes.They question computers' ability to understand texts and consequently to evaluate the quality of writing.On the contrary, it seems, intelligent feedback features such as individualized student approach, showing the error type, explicitly explaining the error, leads to student self-correction.For these reasons, they perceive AWE as an objective solution.
Although some studies strongly reject machine-scoring software, some believe that use of a variety of software packages is beneficial for the 21st century academic settings because they generate student knowledge in EFL courses.

AWE Potentials for the Learning Process
One reason to explore the use of AWE in and out of the classroom is that it has the potential to provide more feedback, independent revision and learning opportunities for students.This type of feedback can often lead to better overall quality of writing.It also provides a space in which writers can learn more about evaluation criteria and improve their skills as writers and revisers (MacArthur, 2007;2012).
Another reason to study AWE is that this technology is poised to take on a more significant role in academic writing education.In addition, due to errors, writing feedback from human raters is often inconsistent.Educators, even if writing is peer-reviewed, may not always recognize the same problems on a student paper each time they evaluate it.This inconsistency can bewilder students.In contrast, AWE feedback provides consistent feedback on the same types of errors for every draft a student submits (Moore & MacArthur, 2016).
Existing research focuses on technical qualities of various AWE tools.It also focuses on the students' writing performance from a faculty perception of student writing proficiency.Nevertheless, it has not fully explored the possible impact on student learning and perception (Burstein, Elliot, & Molloy, 2016).
In this research, I analyze Saudi Arabia female student responses to human and AWE rating systems.Regarding the situation in Saudi Arabia, Alkhatib (2015) points out that instructors use rubrics to assess L2 writing but female students at Saudi Arabia universities are rarely satisfied with their Written Corrective Feedback (WCF), and some of them take teacher comments 'personally' (Alkhatib, 2015).According to Salebi (2004), female students usually provide comments on their writing errors, claiming that they had made mistakes because of test anxiety, concentration on content rather than form and the limited time allotted to the test.This type of anxiety can cause dissatisfaction and demotivation among students learning English as their L2.Students agree with instructors that written responses may affect student writing and attitude.Although written comments are time-consuming; nevertheless, teachers use them because they want to help to improve the writer's performance (Leki, 1990).

Research Focus
More exploration and discussion are required to examine the potential support of online AWE tools in teaching and learning English as a foreign language.The AWE should be used to achieve more desirable learning outcomes while avoiding potential harm from the technology.The present study, therefore, seeks for the answers on two central questions that guided the research:

Method
This classroom-based study investigated the impact of the development and use of AWE on writing performance across two groups of English as a foreign language (EFL) students in academic settings.The study followed the process-product research approach (Warschauer & Ware, 2006), employing a mixed-method design.Through both, quantitative and qualitative research methodologies, data were generated to back the evidence from the researched tools.
The qualitative part incorporated the study of students' perception of the AWE feedback on essays in a natural setting (classrooms) and its interpretation and exploration of the experience of participants.The research sought to explain 'if' and 'why' an AWE promotes student motivation and learning in this environment.As a qualitative research method, (a) open-ended survey questions (see Appendix A where question 5 is the only open-ended one, question 10 and 15 required clarification and justification of students' opinion and answers), (b) participant observation, and (c) informal interviews were used.The aim was to discover how they reacted to the human marker's and AWE tools feedback.During the interviews at the end of both semesters 161 and 162, 61 students from four classes answered the informal interview questions.The aim was to discover how they reacted to the human rater's and AWE tools feedback.During this interview, students answered the following questions: Quantitative data gathered from the 15-question survey and the student's samples marks are expressed in a numerical form which was analyzed and explained in figures (see Appendix B, Figures 1 and 2).

The Educational Context of the Study
The research was conducted in four student groups at Prince Sultan University (PSU), Saudi Arabia, academic year 2016/2017 (semesters 161 and 162).This University is one of the leading private university in the Middle East.The aim is to provide a quality education to the highest international standards.
Overall, there were 43 students in the senior group (group A) and 18 students in the sophomore group (group B).The language of instruction is English.Before starting studies, students must either finish the Preparatory Year Program (which has got three levels, from beginners to pre-intermediate) or come to the university with 5.5 IELTS band.
Female students from the College of Humanities, the TEFL and the Translation Programs were part of the research.There are various writing courses throughout the BA (Bachelor of Arts) studies.All writing courses are mandatory for both Programs.For my research, the intermediate writing course students (group B) and the advanced writing course students (group A) took part.The course objectives of both groups were to learn how to argue and reason in a piece of research.While the sophomore group (B) 'touches' the basics of research, the senior students' semester assignment (A) is to conduct a comprehensive mini research or case study.
All four writing classes which took part in the research were second and fourth-year English major students.Both groups shared some standard features in their writing instruction.Their course objectives aimed to develop students' academic writing skills in research, they covered similar content but in different depths.A similar writing approach, including model essay reading activities followed by pre-writing, drafting and revising activities was applied.

Data Collection
Both researched groups had to complete and submit two drafts (the first and the final) of an assignment (either argumentative essay -group 'B' or research -group 'A').Essay samples one and two used for this study were from sophomore students; essay three and four from senior students which served as examples of their performance.The criterion for choosing essays for this part of the research was students' performance -one essay below expectations (poor), one excellent and two average essays (meets expectations) were chosen.The other students' essays served as control essays; they were evaluated by both, human raters and AWE tools.
Four sample essays (two from semester 161 and two from semester 162) were chosen for the research and ten human markers, who were chosen randomly from two different institutions, evaluated them.Six markers were native speakers and four non-native speakers.Subsequently, the essays were assessed by online AWE tools.The markers, four males and six females marked the essays per holistic rubric (see Appendix C), and their marking was compared against the online grammar and spelling checkers, (see Appendix D).

Participants Profile
Most students at university, aged 18 to 21, are Saudi nationals, 10-15% are non-Saudis.Students come either from local or international schools; while a few studied abroad.d) AWE tools can be used as formative learning tools because they allow students to revise and edit their drafts.The analytic assessment and feedback were given to each essay draft submitted to the AWE tools.

Description of AWE Tools
1) Paid AWE tool 'A' tested in this study combines several characteristics in its feedback.It is immediate (less than 60 seconds from the time of submission) and intelligent (generated automatically by a natural language processing-based engine) online marking tool.It offers individual, student-based submission feedback in the respective discipline.This tool goes beyond fixing misspelled words, it even identifies words, albeit spelled correctly, in the wrong context so a user will not overlook any aspect of writing that can diminish its quality.The tool's operation technology is both technical and contextual -it helps correct grammar, punctuation, and spelling mistakes.At the same time, it also improves vocabulary by pinpointing and fixing contextual errors.It also recommends enhancements of writing style.
2) The free AWE tool 'B' tested in this study is a marking tool to check students' writing for language and style problems before the submission.The best output of this tool can be found in academic writing, including reports and essays.As designers of the program claim, it does not contain sophisticated online techniques, such as AWE software, it allows the users to have formative feedback and instant evaluative regarding their submissions.

Designing a Classroom Performance Task
It is not always easy for the instructor to design a classroom performance task.With this caveat in mind, I took into consideration: 1) The course objectives, skills and initial knowledge which I wanted students to gain as a result of completing a task.
2) A task specification (see Appendix E) -it was related to the students' Major (in this case Applied Linguistics and Translation).
3) Explicit performance criteria measuring the extent to which students have mastered the skills and knowledge were prepared.The holistic approach focuses on overall writers' performance; it does not identify writers' incompetence and errors.Ten human markers, during the Standardization session, earned the basic training in understanding the holistic rubric used for this research.The understanding of the holistic rubric was unified (see Appendix C) and the decision on marking style was taken.

Results
The study intended to observe the potential support of both automated proofreaders for teaching and learning academic writing purposes.The findings indicated that computerized feedback facilitated language learning, assisted in quality writing, and increased the confidence of students.The results suggested that both AWE tools tested in the research develop feedback that encouraged students' revision through the writing process, supported their feelings that the feedback was objective and fair, acted as an intermediary between an instructor and a student and made teaching and learning student-centered.As group 'A' students learned from the beginning how to use AWE tools to improve their writing, their submissions and re-submissions did not contain as many mechanical errors, especially in orthography (spelling and capitalization) and punctuation, grammar and usage mistakes (such as rendering word-for-word into English) as the other group's submissions.Group 'B' students submitted their first draft without using AWE tools.After the instructor's WCF, the AWE tools were introduced, and they acquired the chance to re-submit the first draft after using the AWE, and then, they received the written and AWE feedback again.

Data Analysis and Findings
In Tables A1 and A2 (see below) I summarized scores given by markers and AWE tools and provided the comparison of grades expressed in percentage.The study showed that apart from essay #2, grades displayed a wide discrepancy between the marks awarded by human raters and AWE tool 'B.' Referring to Tables A1a and b and A2, it can be seen that only Essay one (a sophomore student writing) was marked similarly, by both AWE tool 'A' and human markers.It was easier to compare AWE tool 'A' and human markers as the program marked similarly to them, it gave overall percentages for the evaluated essay and holistic comments and focused on genre-specific writing style (see Appendix F and Table A2).Tool 'B' uses a scale of 12 points and more analytical evaluation of areas such as paragraph, sentence, vocabulary, academic style, grammar and punctuation and general writing style rating complemented with smiley faces (see Appendix G).Its analytical style of evaluating writing detects and highlights issues related to academic styles such as vagueness of language, informal phrases, over-used expressions, comma splices, sentence fragments and many more, so the agreement between this program and the human raters' grades was very low.However, if attention is paid on types of mistakes in essay three, despite the differences in scoring (AWE 'A' 80%, AWE 'B' 59%) both tools detected and marked similar mistakes in the academic style.When analyzing closer essay 1 mistakes, Tool 'A' seemed to 'understand' that the essay was an argumentative one focusing on personal issues, not academic ones.Tool 'B' marked all personal pronouns as mistakes of an academic style and for that reason, it gave much lower grades (see Appendix B).

Survey and Interview Analysis
At the end of both semesters, an informal, in-class interview was conducted with all 61 students from both groups.During the informal interview, students used more positive than negative adjectives for question (a).
From the positive ones, they mentioned that using AWE tools was "exciting, new, interesting, stimulating, educational, encouraged thinking and independent work, objective and both tools helped them with the reviewing process."Negative adjectives such as "challenging, scary at the beginning, more time consuming because it was easier to ask an instructor and get the answer, were mentioned." In their comments to question (b), they echoed their remarks from the survey -they understood that the combination of a teacher and AWE feedback enhanced feelings of objectivity and fairness.There were a few different responses (4 out of 18 students) from the sophomore students.They thought that it was too difficult to use AWE tools and preferred teacher evaluation.The process of reviewing was time-consuming for them.
Most of the students (57 out of 61) agreed that AWE promoted learning and helped in improving writing skills ("AWE explanation helped me to understand my mistakes").Some students mentioned they did not like reviewing their essays.They even admitted ignoring the AWE review of their first draft ("I did not review my writing"), but once they saw that those students who utilized AWE got better feedback from the first draft, they made use of the AWE tool for their final submission.Most observed students (53 out of 61) did not want to purchase the paid tool 'A,' and so they used tool 'B.'They found it straightforward to use and understand, liked its 'Advice' feature and the 'Report card.'The most comprehensive and appreciated feature was the instant advice for improvement.
Those who used tool 'A' were keen on its discipline-specific feature as well as plagiarism checker.Some sophomore students reported problems with understanding the AWE tools explanation of the problem.
Almost all students (59 out of 61) suggested keeping these or similar tools because they feel it was motivational and easily mastered.In other words, both tools assisted them in improving vocabulary, using of prepositions, raising awarness of their weaknesses in writing.Some students also recommended implementing more training on how to understand AWE tool comments.
In a 15-question survey, both groups mostly agreed that only teachers should mark most of the assignments (question 1) and that writing is a useful skill (question 8).In question 4, sophomore students preferred direct correction by a teacher (55.55%), while many senior students (69.76%) thought a combination with AWE tool increased assessment motivation and objectivity (see Appendix A, Figures 1 and 2).Only 27.77% of sophomore students stated that AWE together with teacher's feedback was preferable.Another difference between groups was noted when students commented on rubrics (question 10).While senior students assumed, it was essential to get a rubric just before the assignment (41.86%), sophomore students believed it was more important at the beginning of the course (50%).
Questions 12 and 13, where I asked how students felt about the feedback given by an instructor and the AWE tool, both groups agreed that they felt the feedback, in this case, was more objective.Even though more than 72 % of senior students and 66.66 % of sophomore students felt that teacher feedback is objective, they stated, that a mix of teacher and AWE feedback, is not only motivational but more objective (79% of senior students and 72.22 % of sophomore students were in favor of this combination).
Questions 14 and 15 which echoed research questions and answers from students, supported the hypothesis that both tools were motivational and confidence building which supported their learning.Overall, 81.39% of senior students and 66.66% of sophomore students agreed that AWE tools were helpful and 7.41% thought using tools was motivational because: "You were the first instructor teaching us how to use AWE as well as how to correct our mistakes this way.It was interesting, and it motivated me to think how to do it" (question 15).
Question 5 asked students for justification of question four.Mostly, students did not prioritize peer feedback but expressed their belief that an AWE tool was a positive addition to an instructor's feedback, as it increased objectivity, supported learning and provided a fair evaluation.

Discussion
A few studies are investigating and analyzing AWE tools designed especially for teaching purposes.Despite the accessibility, online AWE tools have not been analyzed so far and therefore this study can start a discussion as to whether these accessible tools can or cannot be used in the classrooms.Overall, AWE tools under investigation were perceived positively by my students, and this despite their limitations in recognizing genre, style and context (especially in the unpaid version.However, some students echoed Ericsson and Haswell's (2006) doubts and they questioned a computer's ability to understand the content of the writing and accuracy of correcting mistakes.
This research showed that pedagogical practices with online AWE tools influence student perceptions in facilitating their learning of writing.The AWE implementation was accepted quite favorably when the program was used to help students with the revision of their first drafts.
As mentioned previously, AWE offers instant feedback which may vary in the degree of clarity and specificity.Such feedback leads students' attention to their linguistic inadequacies and motivates them to produce more comprehensible output.Based on Beatty and Gerace study (2009), within the conceptual changes in pedagogy, the main role of classroom discourse is to direct students' thinking and provide the environment to enhance thinking, cognitive skills and entire learning.
The purpose of this study, based on the research qustions, is to empirically explore whether automated feedback can trigger learning processes and foster student/computer interaction that potentially leads to the revision of output and improvement in students' written performance.Based on the informal interviews conducted in both researched groups, most students agree the immediate AWE tool enhances their self-correction techniques.
Although instructors can use AWE to operationalize the constructs of input and interaction, output has not yet been addressed by research.Considering this gap, the paper investigates the learning potential of automated feedback generated by two online grammar and spelling checkers.
Most students perceive AWE as an objective solution.Although some limitations of the programs' methods of assessment and writing support features could have negatively affected student attitude towards the usefulness of AWE online tools, both classes appreciate this new experience.The students' remarks on the suitability of the implementation of the both online tools centers around: (a) prioritizing either human or AWE online tools feedback, (b) understanding of AWE corrective feedback and scores, (c) enhancing motivation to learn , (d) addressing new, challenging but modern ways of learning writing via AWE tools.
As described in my literature synthesis, I use AWE tools for pedagogical purposes to enhance students' self-belief in writing, attitudes, and motivation and learner autonomy (Smith, 2008).To conclude, the use of online AWE tools is not a simple "yes" or "no" issue.This matter involves a complex combination of factors regarding software design, pedagogical practices, learning contexts and student willingness to accept assistance from a non-human marker.Despite any shortcomings AWE tools may have, the presented study supports using these tools for teaching and learning purposes as not all institutions can implement professional writing applications into classes.Using easily accessible tools, based on this study, promotes student learning and independence.
The development of AWE tools is intensifying, and it is believed that very soon, they will be able to achieve the accuracy, which will be indistinguishable from human examination.Also, the research strives to prove that students are more motivated to revise because of the 'technology' feedback.Moreover, the receipt of prompt results from AWE tools motivates students to revise extensively prior to submitting their writing for scoring.Hopefully, this paper will have the effect of encouraging decision-makers and instructors to explore and possibly implement the use of such tools in order to enhance student autonomy and overall writing improvement.
a) Can an online open-source writing aide tool be successfully implemented in writing classes to promote learning?b) What are the most important factors an open-source writing aide offers which influence student perception, motivation and learning?
a) Could you use three adjectives describing your experience with AWE? b) Could you describe your feelings about the teacher's holistic feedback and AWE feedback?c) Do you think AWE helped you to learn and improve writing?d) What did you like most/least about the fact we used AWE for reviewing process?e) Would you recommend continued use?Why?Why not?
a) Both are accessible to the public and straightforward to use and understand.b) They provide holistic and analytic scores along with feedback on style, content, language use and mechanics.c) Both tools offer a wide range of writing correction ideas.

Figure 1 .
Figure 1.Group A -Senior student survey results