A Review of the Washback of English Language Tests on Classroom Teaching

Scholars have long recognized the Washback effect of English language tests on English teaching inside the classroom. However, the lack of scholarly reports in this area is also nonnegligible. Therefore, the present study intends to review some empirical researches that focus on the washback of some English language tests on different aspects of classroom teaching, including the washback on course content, teaching materials, and teaching activities. Both positive and negative washback are found on these aspects and can be attributed to a number of factors, including differences in features of the test content, differences in tests’ coordination to course syllabus, differences in teachers’ adoption of teaching methods, etc. The final discussion recognizes the complicated mechanism of washback of the English language test on classroom teaching and serves to bring out some scholarly and pedagogical implications. On the one hand, future studies could focus more on how to bring out positive washback of English language tests on classroom teaching. On the other hand, pedagogical practices could take advantage of the latest scholarly findings to maximize the efficacy of the aforementioned positive washback.


Introduction
Some English language tests have been known to have possible washback effects on the teaching process (e.g. Alderson & Hamp-Lyons, 1996;Shohamy, Donitsa-Schmidt, & Ferman, 1996;Hamp-Lyons, 1998). The washback effects could be so powerful that educational policymakers sometimes even rely on these tests to implement changes in curriculum and instruction (Pearson, 1988;Shohamy, 1993). Cheng (1997) noted that although teaching conventionally came before tests, English language tests often came before and influence teaching. For example, course content could be designed for test preparation (Cheng, 1997).
In stark contrast to the significance of the washback effects of English language tests is the inadequacy of the studies. In other words, there needs to be more researches devoted to this topic (e.g. Wall, 1996;Watanabe, 1996). Thus, the purpose of the present study is to review some scholars' findings of some washback effects of English language tests on different aspects of classroom teaching, including the washback on course content, teaching materials, and teaching activities.
learning process, i.e. participants, process, and product. Participants refer to language teachers, language students, etc. Process refers to the behavior of the participants in the teaching and learning process. Product refers to the language knowledge and skills acquired by students as well as the quality of the acquisition. Washback first has its impact on the understanding and perspectives of participants who would then alter the teaching method, teaching materials, and other components of the teaching and learning process which would alter the product of learning accordingly.
Above is a brief discussion of the scholarly understanding of the washback concept. In the next section, the present study will focus on discussing the washback of English language tests on course content.

Washback on Course Content
The review of the literature revealed a complicated picture of how the washback of English language tests could influence course content. Negative and positive washback could occur, even at the same time. Additionally, the intensity of the washback of the test on course content could vary across different situations. Below is an integration of scholarly findings of the washback of English language tests on course content. Barnes (2016) observed negative washback on course content in her study of the Washback of the TOEFL iBT on the preparation courses for adult students in Vietnam. They argued that teachers' reliance on TOEFL iBT textbooks could explain teachers' focus on teaching language skills tested in TOEFL iBT. This resulted in a course that mostly served to train students to get a good mark in a test rather than acquire language. She suggested that the course content could be improved if the teachers focus more on engaging students in the process of "acquiring language skills" instead of going through many items on a "skills checklist". Therefore, Barnes (2016) argued that teachers' reliance on skills checklist prevented a more holistic and process-centered way of teaching. Kılıçkaya (2016) studied the foreign language section of TEOG (Transition Examination from Primary to Secondary Education) in Turkey. The author noted that the secondary school language teachers in the study focused on teaching grammar, vocabulary, and reading which were the major elements of the content of the language section of TEOG. While the teachers in the study were aware of the importance of other English skills, such as speaking and listening, they confessed that teaching to the content of the test was best to help students prepare for the test. The teachers also taught student test-taking strategies that suited the format of the language section of TEOG.

Negative Washback
Finally, in their study of the washback of English national examination (ENE) in Indonesia, Furaidah, Saukah, and Widiati (2015) found that at some schools, the increased focus on teaching ENE related content came with the exclusion of the skills not tested in ENE, which are speaking and writing. Listening and reading multiple-choice questions were given much more favorable attention because they constituted a large portion of the test. Furaidah et al. (2015) noted that this was the negative washback of teaching to the test.
Building on the findings of the above three studies, one type of negative washback of English language tests on classroom teaching could be the restraint on the language skills taught in the classroom because of a strong inclination to devote more classroom teaching to the skills tested in the exams (Barnes, 2016;Kılıçkaya, 2016;Widiati, 2015). As a result, other significant language knowledge or skills are excluded from classroom teaching.

Positive Washback
Positive washback on course content were also found by scholars. The study by Wang, Yan, and Liu (2014) focused on the washback of internet-based College English Test Band 4 (IB CET-4) in China. The old National College English Test Band 4 (CET-4) is a well-regarded "high-stakes" test in China because test-takers' performance could serve as an indication of the quality of college English education. Besides, a good CET-4 grade has become almost indispensable for successful employment for college graduates. However, the fast development of the modern IT industry demands literacy skills essential for sending business messages and engaging in interviews or negotiations, which are far beyond the traditional language skills tested in CET-4. Therefore, to enhance its validity and beneficial washback, developers and researchers have carried out reforms via information technology to replace CET-4 with IB CET-4. The old single listening section was converted into an audiovisual section, which has more test authenticity. Besides, the IB CET-4 integrates the testing of listening, speaking, and writing and focuses more on testing the integrated language skills. Influenced by the construct of the IB CET-4, teachers in the study shifted their focus of instruction from teaching vocabulary and grammatical knowledge to cultivating students' communication skills with the help of materials that helped to construct a nearly authentic language context. Cheng (1997) also dicoverd positive feedback in the study of the washback of the Hong Kong Certificate of Education Examination (HKCEE). She argued that given the ambiguous nature of the washback effect and limited data available, it was very difficult to determine whether positive or negative washback occurred on course content. Moreover, due to the lack of data, the extent to which course content was affected could not be determined either. Based on the observation in the study, changes to the way the teacher arranged their course content could at least be attributed to the cramming effect. However, the author did stress that it was understandable teachers modeled the course content based on the content of the examination because helping students to fare well on the examinations was conducive to their future career development. In this sense, HKCEE could bring positive washback on course content.
In stark contrast to the previous three studies on the negative washback of English language tests on teaching content, the above two studies, especially the one by Wang et al. (2014), revealed that language teachers would assimilate more communicative language teaching content into the teaching content because of the positive washback from the test designed to prepared students for better application of English for communicative purposes in real-world settings.

Both Negative and Positive Washback
In some cases, the washback on course content could be both negative and positive, as was demonstrated in the study by Wall and Alderson (1993). Their research focus was the washback of The O-Level examination in English as a Second Language in Sri Lanka. The examination was molded on the content of the textbook so it was inevitable that the course content resembled the content of the examination. In the study, both positive and negative washback were observed. Scholars regarded teaching the content of textbooks as positive washback if the teacher intends to create more meaningful learning experiences. But it could be negative washback if the teacher required the students to memorize the textbook content. Moreover, negative washback on course content was observed as the teaching to the exam content led to more time teaching certain skills than others.

Washback of Different Degrees
In other cases, scholars observed different degrees of washback on course content. Watanabe (1996) set out to investigate whether the presence of translation questions in the Japanese university entrance examination led to the focus on grammar and translation in EFL classrooms. Research results demonstrated that the test content did not generate an explicit washback effect on one teacher's course because no matter whether the target examination had a focus on translation questions or not, that teacher would focus on teaching translation and grammar. Yet, the other teacher only focused on teaching translation and grammar when the target examination consisted of a certain number of translation questions. Watanabe (1996) went further to point out three possible factors that led to the difference in washback on these two teachers. First, the two teachers' differences in their ways to prepare university entrance examinations might contribute to the difference in their teaching. Second, the two teachers' different beliefs in effective teaching methods might play a role. The first teacher believed in a universal way to solve certain test questions while the second teacher chose to teach students to pick up the more appropriate problem-solving method within different contexts. A third factor here was timing. Teacher A was observed when the examination was more than half a year away so he thought in his style. But teacher B was observed when the examination was due only two months later so he had to emphasize more test-taking strategies.
Shih (2009) also observed washback of different degrees in the study on the washback of the General English Proficiency Test (GEPT) on two English teachers in Taiwan. Observation showed that one teacher's course content focused on test-taking strategies. She also incorporated into the course content test-related information such as when to register for the test. By contrast, the other teacher only experienced superficial washback on his teaching process. This was probably due to his lack of commitment to his job as he was not familiar with the test content and later resigned from his position. Zou and Xu (2017) conducted a study on the washback of China's TEM8 test on language teaching for English major students. The study involved in 724 college administrators with abundant English teaching experience. The findings indicated that the course content and design of English courses in the universities under study reflect the setup of TEM8. For example, the writing and translation courses in the syllabus adopted by these universities could correlate with the writing and translation part of the TEM8 test. Furthermore, some participants reported that their universities offered students listening courses in response to the great importance attached to the listening part in TEM8. It is worth mentioning that the listening course was not part of the syllabus. Lastly, some universities offered test preparation courses. But, in many cases, their course load only accounted for a quarter of that of a normal course. This indicated limited washback of test preparation on course planning.
In short, the above studies presented a rather complex picture of the washback of English language tests on classroom teaching content. Watanabe (1996) and Shih (2009) revealed that the washback could have different degrees of effect on individual teachers' design of course content due to their different pedagogical philosophy. Zou and Xu (2017) investigated on a broader level where different schools responded differently to the washback of English language tests. All of these could serve to illustrate the complex mechanism underlying the washback of English language tests on course content

No Salient Washback
Of course, the washback on course content could be barely observable. The research by Qi (2005) concentrated on the washback of the National Matriculation English Test (NMET) in China. According to Qi (2005), the NMET was intended to change schools' teaching focus of "from linguistic knowledge to the teaching of language use" (p. 148). But this intended washback did not happen largely due to the teachers' inappropriate reliance on the test to design their course. Qi (2005) discussed four aspects relevant to this issue. First, she noted that the teachers, in this case, inaccurately perceived NMET's focus to be grammar and vocabulary while the true construct of NMET was more centered on language use. Second, Qi (2005) argued that teachers incorporate test content into course content without understanding the test designer's intention. For example, the test designer originally treated the tests' proofreading section as an important procedure of students' writing process and hoped students' could incorporate proofreading into their writing practice. But teachers regarded proofreading as an end in and of itself and taught it as a separate skill. Third, Qi (2005) stated that teachers' course content was often influenced by the format of test questions, such as multiple choice. This could stand in the way of promoting "authentic language use" (p. 162). Lastly, Qi (2005) pointed out that to fully prepare students for the test, teachers arranged a lot of mock tests for the students. She further argued that the time spent on these mock tests could be instead spent on teaching practical language skills.

Washback on Teaching Materials
The review of the literature also found that negative and positive washback on teaching materials occurred in studies. Below is a brief review of this matter.

Negative Washback
Hamp-Lyons (1998) did a TOFEL textbook analysis to demonstrate the negative washback TOEFL had on EFL teaching. To begin with, the prevalence of TOEFL generated the popularity of TOEFL textbooks. Hamp-Lyons (1998) noted that these books were built on the content of the test rather than a model of language use and therefore left many teachers and students lost among decontextualized concepts and rules. This, in turn, explained why even experienced EFL teachers encountered tremendous challenges in planning TOEFL courses and why they kept coming up with unclear and incorrect explanations. Furthermore, the decontextualized textbooks also rendered the teachers incapable of constructing the underlying rationale of their course design. What they did instead was explaining the items as arranged in the textbook. Thus, the goal of their course degenerated into a reproduction of textbook content. Additionally, as there was no credible syllabus for TOEFL courses, the only way to design teaching material was to identify the frequently tested items in the real TOEFL test.
In her previously mentioned study of the washback of HKCEE, Cheng (1997) noted that the washback on teaching materials was of a high degree. This could be illustrated by the fact that after the instruction of the test, almost every secondary school in Hong Kong changed the syllabus and adopted textbooks specially designed for passing the test. The textbooks were found to provide both teaching materials and the foundation on which teachers could base the design of their classroom activities. However, some teachers did experience confusion in terms of the objectives listed in the textbooks. The negative washback was more severe in the study on the washback of the foreign language section of TEOG in Turkey by Kılıçkaya (2016) who observed that the teachers were given textbooks selected by the Ministry of Education. The adoption of these textbooks was mandatory, so all the teachers had to use them. But many teachers complained that these books were so inadequate that they were forced to find additional materials to use in class.
In sum, the above three studies demonstrated that part of the negative washback of English language tests could be the requirement imposed on the teachers to adopt textbooks that were strictly built on the content of the text. This could either confuse the teachers because the textbooks might not be in line with the course syllabus (Cheng, 1997;Hamp-Lyons, 1998) or render the teaching materials inadequate (Kılıçkaya, 2016).

Positive Washback
However, positive washback on the teaching materials were also found by scholars. Saif (2006) investigated the washback of a speaking test on classroom teaching of a TA (teaching assistant) training program in the US. The teacher in the study mainly used those teaching materials that focused on the skills tested in the speaking test. She also revised some teaching materials to address some aspects of her students' speaking skills that she found inadequate by analyzing their test performance. Moreover, Wang et al. (2014), in their previously mentioned study on the washback of IB CET-4 in China, also observed that teachers adopted new materials to construct a nearly authentic language context. These materials were found to take the form of aural recordings or video clips.

Washback on Teaching Activities
The review of the literature found that positive and negative washback on teaching activities could occur. The effect tends to take the form of student-centered activities and teacher-centered activities.

Activities that are More Student-Centered
In some cases, more student-centered activated were found to be a type of positive washback on teaching activities. For example, Saif (2006) studied the washback of a speaking test on classroom teaching of a TA (teaching assistant) training program in the US. The findings revealed that the teacher's choice of classroom activities was affected by the speaking test, although she has questioned the speaking test's validity for the course purpose. For example, to address the students' speaking problems exposed in the speaking test, the teacher carried out her teaching process through group discussions in which students had ample opportunities for presentation and feedback. Saif (2006) suggested that since the teacher personally favored a seminar-style of teaching, the test brought some positive washback to her choice of classroom activities.
In their study on the washback of the IB CET-4 test in China, Wang et al. (2014) also noted that teachers preferred a more student-centered instructional method to the more traditional direct instruction. In this way, they hoped to encourage students to develop a self-learning mode that was both effective and sustainable. Moreover, instead of teaching in a traditional classroom-based setting, the teachers started to embed their teaching activities in an Internet-based environment where students could benefit from self-learning mode. They found this Internet-based teaching style more efficient than traditional classroom-based supervision. In another study, Turner (2006) collected perspectives of ESL teachers on the washback of provincial exams in Quebec. The study showed that teachers adopted the group discussion format from the exam's speaking section and applied it to their teaching activities in the classroom. Some teachers preferred to have students use relevant phrases to state their perspectives and ask questions during their group discussions. Other teachers recommended having students practice certain speaking activities to prepare for the exam.
However, student-centered activities could stem from both negative and positive washback. In their study on the washback of the EFL speaking test in Israel, Shohamy et al. (1996) found that for students who would soon take the test, teachers solely concentrated on teaching the tested skills of the speaking test through a variety of group collaboration work. For the other students, teachers relied on not only activities such as in-class discussion but also video watching.
Taken as a whole, the above four studies revealed that when the resulting activities were found to be student-centered, the washback on teaching activities would very likely to be positive and promote students' language learning. In this case, the teachers' adoption of student-centered activities could be attributed to the communicative nature of the test. However, Shohamy et al. (1996) demonstrated that in the student-centered learning activities, some teachers still chose to restrict course content to the language skills tested in the exam which the students would soon participate in.

Activities that are More Teacher-Centered
Washback of English language tests could also give rise to teacher-centered teaching activities. For example, the observation by Alderson and Hamp-Lyons (1996) showed that TOEFL teachers in their study did not make salient efforts to make the course more interesting and very often carried out the teaching process through a monotonous pattern. The researchers found that this pattern dominated the teaching process and thus gave students very few opportunities to raise questions or interact with either the teacher or other students.
In her study of the Washback of the TOEFL iBT in Vietnam, Barnes (2016) observed that in contrast to normal English courses in which students had opportunities to engage in-class activities and work with fellow students, most of the TOEFL iBT courses were dominated by teacher instruction. She noted that this teacher-centered teaching method resulted from teachers' reliance on TOEFL iBT textbooks to teach test preparation courses. She furthered argued that such teaching style was by no means appropriate for classroom teaching. In another study on the washback of the foreign language section of TEOG in Turkey, Kılıçkaya (2016) observed that teachers relied on direct instruction. The only occasions where students played a more active role was when they read short conversations similar to those tested in the language section of TEOG.
In a word, the above three studies revealed that when the teaching activities appeared to be teacher-centered, e.g. direct instruction, the washback on teaching activities was likely to be negative because the active participation of students could be suppressed in this case (Alderson & Hamp-Lyons, 1996;Barnes, 2016;Kılıçkaya, 2016). Teachers in these studies chose direct instruction probably because they regarded it as an efficient method to train students to prepare for language tests. Furaidah et al. (2015) observed both student-centered and teacher-centered activities in their study of the washback of (ENE) in Indonesia. They noted that classroom activities were of two kinds, including "regular teaching activities" (p. 49), which focused on non-ENE related English teaching, and "drilling activities" (p. 49), which focused on ENE related English teaching. Generally, students were given more opportunities to interact with their classmates in regular teaching activities. By contrast, such interactions were much less likely to occur in drilling activities where teachers' involvement was much more dominant during in-class interactions.

Conclusion and Discussion
To sum up, when reviewing some scholars' findings of the washback of English language tests on classroom teaching, the present study focused on three aspects, which are the washback on course content, teaching materials, and teaching activities. The contributing reasons for the washback were also noted. However, the limited space of the present study does not allow for reviewing other equally important issues, such as the washback on time allotment of teaching (Furaidah et al., 2015), the washback on in-class assessment (Kılıçkaya, 2016;Turner, 2006), and the washback on teachers' attitude towards the test (Cheng, 1997;Kılıçkaya, 2016;Turner, 2006). Their absence does not degrade their scholarly significance.
One major finding of the present study is that the analysis washback of English language tests indeed demands careful context-based interpretation. A good example could be that in the study by Furaidah et al. (2015), teaching to the test was negative washback but in the study by Cheng (1997), the washback was positive.
Besides, the present study reveals that the mechanism of washback is nothing short of complexity. Scholars resonates with this finding. For example, Wall (1996) proposed to explain why, in the study by Wall and Alderson (1993), both positive and negative washback heavily affected teachers' course content and design of their in-class assessments, but no salient washback was observed on teachers' classroom teaching methodology. She noted that the plethora of factors influencing the washback prevented test designers from discovering an effective mechanism that could shed light on how different factors would affect the intended washback.
Other scholars interpreted the complex mechanism of washback from various perspectives. Alderson and Hamp-Lyons (1996) proposed that the variance of TOEFL's washback on teachers could not be explained by the nature of the test alone. For example, class administrators' mandates caused the large class size which was unfavorable to in-class interaction. Textbooks designers could also be responsible because they did not specify how the teachers could use the materials. Lastly, teachers themselves could be a huge factor because many of them demonstrated the unwillingness to renovate their course content. Furaidah et al. (2015) also noted many factors that contributed to the intensity of washback of language tests, including teachers' perception toward teaching, their perception toward students' performance, and the quality of school (average capacity of students). Besides, Shih (2009) identified three categories of factors affecting the intensity of the washback of language tests. For example, contextual factors involved course goals (Hayes & Read, 2004) and the support schools could provide to teachers (Hawkey, 2006), etc. Test factors involved the level of test stake (e.g. Alderson & Hamp-Lyons, 1996) and the construct of the test (Shohamy et al., 1996), etc. Teacher factors involved the amount of training teachers received (Green, 2006) and teachers' capacity of the target language (Qi, 2007), etc. Messick (1996) argued that "one can…turn to the test properties likely to produce washback" (p. 242). However, building on the understanding by Alderson and Wall (1993), Cheng and Curtis (2004) argued that washback could have no causal links with the features of a test. Such a statement could sound somewhat extreme. But taken together with the analysis of those scholars mentioned above, the argument of Cheng and Curtis (2004) seemed more to be saying that "it is possible that research into washback may benefit from turning its attention toward looking at the complex causes of such a phenomenon in teaching and learning, rather than focusing on deciding whether or not the effects can be classified as positive or negative." (p. 11)

Implication
Given the complex nature of the washback of English Language tests on classroom teaching, researchers could devote more attention to studying the causes of washback, including test properties and other factors in the educational context, and investigate the mechanisms that could bring about positive washback on classroom teaching. English language teachers could also benefit from scholarly works on the washback of English Language tests on classroom teaching by using theoretical understanding to help them decide whether a test is likely to have positive washback on their teaching and to appropriately integrate test content with the teaching and learning process. The ideal scenario would be that the positive washback of English language tests works in line with course syllabuses and classroom teaching to further students' language development while also enhance their test preparation.