A Comparison Between Teacher-Led and Online Text-to-Speech Dictation for Students’ Vocabulary Performance

Researchers have long supported the use of dictation as a test for language learners (Fountain & Nation, 2000), and dictation has been used as a test for learners of English as a foreign language (EFL). With the advantages of productive learning and reinforcing short-term memory, dictation is a commonly used technique to develop language skills, and it can be considered to be an assessment of foreign language learning (Kazazoğlu, 2013). However, the previous research has not fully explored how technology, such as text-to-speech (TTS), can be used in EFL classrooms. To address this issue, the researcher explored the use of traditional teacher-led dictation (TLD) and TTS dictation to compare the vocabulary performance of EFL learners. Forty-two college students participated in the study. The results indicated a significant difference between TTS and TLD on the participants’ vocabulary performance. Additionally, there was a correlation between the scores with TTS and TLD: the students who performed better with TLD also obtained higher grades with TTS. Based on the results, future studies and pedagogical suggestions are presented.


Introduction
The use of computers in modern education is producing gratifying and positive results with regard to improving the quality of education of the emerging and current Internet generation. Society has been subject to changes and transformations that have generated new elements in the teaching-learning process. These changes have allowed computer-assisted education to motivate emerging generations in the process of knowledge construction because these systems have a more practical dynamic for their application.
Dictation has been used as a testing device in language learning for a long time (Mohammed, 2015). Scholars have positive and negative views of dictation. Those who consider dictation to be a useful tool view it as a way to diagnose aural perception as well as English mistakes among learners (Davis & Rinvolucri, 2002;Tang, 2012). Mohammed (2015) mentions that some people might take a stereotypical approach to dictation and consider it old-fashioned, boring, and a teacher-centered method (Mohammed, 2015). The teacher-led dictation (TLD) method has been widely used in English as a foreign language (EFL) classrooms (Alkire, 2002;Davis & Rinvolucri, 2002;Fountain & Nation, 2000;Habibi, Nemati, & Habibi, 2012;Kavaliauskienė & Darginavičienė, 2009;Kazazoğlu, 2013;Kiany & Shiramiry, 2002;Montalvan, 1990;Morris, 1983;Natalicio, 1979;Oller, 1979;Rahimi, 2008;Sawyer & Silver, 1972), while the text-to-speech (TTS) method is a new attempt to improve language learning among EFL students (Biancarosa & Griffiths, 2012;Eksi & Yesilcinar, 2016;Tang, 2012), especially in the area of reading comprehension (Wood, Moxley, Tighe, & Wagner, 2018). especially in the field of EFL. With the development of science and technology, there will always be new software inventions that can be applied to language teaching. The TTS application is one of them. Davis and Rinvolucri (2002) define dictation as the decoding of sounds by learners and the recoding of them in writing. The dictation method is not new in the field of language learning. Dictation can be simply defined as "a person reading some text aloud so that the listeners can write down what is being said" (Mohammed, 2015, p. 207). According to the Oxford Learner's Dictionaries, dictation is "the act of speaking or reading so that somebody can write down the words." In fact, dictation is known as one of the oldest techniques used for testing progress in EFL. It can be traced to sixteenth-century textbooks, and it has been used in language learning as well as teaching for many years (Kazazoğlu, 2013). It is usually associated with the traditional grammar method, which emphasizes the translation and memorization of the target language's grammar rules (Stansfield, 1985).
The dictation technique is effective for both teachers and learners (Imene, 2016). According to Alkire (2002) and Davis and Rinvolucri (2002), the dictation technique has psychological power and is easy to manipulate, prepare, manage and fit to all proficiency levels. For teachers, even novice teachers, dictation is a useful technique for small-or large class sizes and mixed-ability groups, and it is also effective in providing individual attention, motivating self-correction, reviewing learning tasks, and preparation for oral communicative exercises. For learners, dictation provides the opportunity to practice note-taking skills, develops short-term memory, and improves unconscious thinking in the language-learning process, helping students to develop literacy (Montalvan, 1990). It can help the development of all language skills in the target language (Mohammed, 2015). Jafarpur and Yamini (1993) discussed dictation as a form of dual-access processing for learners to alter as well as harmonize their perception, conception, and expression. Despite the benefits of dictation, it is no longer as popular as it was in the past because it is considered to be boring, old-fashioned, mechanical, inauthentic, and teacher-centered (Davis & Rinvolucri, 2002;Kazazoğlu, 2013;Mohammed, 2015), especially in ESL or EFL classrooms (Alkire, 2002).
According to Sawyer and Silver (1972), dictation can be categorized into two parts, and each part contains two types. In total, four types of dictation have been widely used in language teaching as well as learning. The first part is phonemic dictation, and the second is orthographic dictation. Phonemic dictation can be further divided into phonemic item dictation and phonemic text dictation. The former indicates that language teachers model the individual sounds of the target language for students to transcribe to increase the students' ability to produce accurate outcomes. Similar to phonemic item dictation, phonemic text dictation extends the individual sounds into a passage. The second type of dictation is orthographic dictation, which can also be divided into orthographic item dictation and orthographic text dictation. Orthographic item dictation usually focuses on the correlation between sounds and spellings, similar to the traditional spelling test, in which teachers read individual words in isolation for transcription. Similarly, orthographic text dictation asks language learners to transcribe a unified text or passage for comprehension or grammatical correction. Dictation in this paper is considered under the category of phonemic item dictation, which is associated with word recognition. Learners write what they have heard as correctly as possible. Huang and Liao (2015) discuss TTS programs as alternatives for reading texts aloud, on portable or desktop computing equipment as well as on smart phones and tablets. These programs help users read a text without the requirement of human intervention and sometimes with the ability to download a file to preserve audio. These applications provide good simulation of a human voice and accent; thus, language learners can use these applications to understand how to pronounce words and sentences in the learning process, thereby supporting their learning.  mtexttospeech.c TTS audio fi he language (a sian), select the t and very fast)

CALL-Based Vocabulary Learning
Vocabulary has always played an important role in English learning (Nation, 2015). Teaching in the 21 st century has been refined by the use of supplementary technical means, and computers have played a prominent role due to the advantages they offer for the explanation of concepts as well as their applications. Effective methods for teaching have long been explored, especially as technology has progressed. Diverse applications have been associated with the teaching of various contents. Students are currently involved in changes and transformations that have generated new elements in the teaching and learning process. This has allowed computer-assisted education to motivate new generations in their process of knowledge construction and has given these systems a more practical dynamic for their application.
Dictation is a beneficial method for language learning and has been used in the classroom for centuries (Mohammed, 2015). Numerous studies have been conducted in the field of vocabulary learning as well as dictation; however, little research has examined the application of online TTS to EFL learning, especially in vocabulary. Therefore, this study aims to compare EFL college students' vocabulary performance between TLD and online TTS dictation. The research questions are listed below.

Research Questions
Few of the previous studies have focused on the use of TTS technology with dictation techniques to enhance students' vocabulary dictation performance in Taiwan.
1). Is there a significant difference between TLD and TTS in the performance of word recognition?
2). Is there a correlation between the scores of TLD and TTS?
3). What are students' perceptions of the use of TTS in vocabulary dictation tests compared with the TLD?

Research Null Hypotheses
To compare students' vocabulary dictation test performance when using TTS and TLD, the following null hypotheses are formulated for research question one.
H0: There is no significant difference between TTS and TLD in achievement on vocabulary dictation tests.
H1: There is a significant difference between TTS and TLD in achievement on vocabulary dictation tests.

Participants
The participants for this study were 42 low-to intermediate-level students (Table 1) who majored in English in the applied foreign language (AFL) department. The students were between the ages of eighteen and nineteen years old and had studied English for at least five years before entering the AFL department. During the research period, they were taking English at least eight hours per week. A weekly dictation test was employed in the English vocabulary and reading class to develop the students' word recognition and spelling. Text-to-Speech technology (TTS) is often used to translate written text into spoken text. The selected TTS, which is available at https://www.naturalreaders.com/online/ (see Appendix B), provides three editions (Web Free, Web Premium, and Commercial) for users to choose depending on their service coverage needs. The Web Free edition was the one chosen by the researcher for use throughout the study. It includes 20 minutes per day of premium voices, unlimited usage of free voices, and supports PDF, Docx, RTF and TXT document upload. In the Web Free edition, users can upload the selected written text into the system and choose the preferred English accent (US or UK), voice (male or female), and speed (from -4 to 9, indicating extremely slow to extremely fast).
The Text-to-Speech Perception Questionnaire (TPQ), designed by the researcher, was used as a testing material in this study and graded to assist in understanding how the participants think about the implementation of TTS on a vocabulary quiz. The TPQ includes seventeen 5-point Likert-scale questions and two multiple-choice questions related to solutions for the pronunciation of unknown words. Forty-two valid questionnaires were obtained after removing invalid questionnaires.
Vocabulary Dictation Quizzes (VDQ) was used in the classroom in this study to help the researcher gather the participants' perceptions of two aspects of dictation methods: traditional TLD and the TTS method. According to Kazazoğlu (2013), the selection of a dictation text should be appropriate to the level of the learners. Dictation reinforces basic sentence structure as well as vocabulary at the intermediate level. The overall speed, pauses between material, and number of times the text is presented may influence the difficulty of the text (Kazazoğlu, 2013). Oller (1979) stated that the researcher should choose the dialect and the pronunciation with which the learners are most familiar. Hence, in this study, a US accent, a female voice, a speed of minus 2, and repeating each word twice were selected to provide the participants with a friendly atmosphere for TTS.
The vocabulary on VDQ was chosen from the TOEIC: Vocabulary Express 3000, from which the participants were required to acquire a certain amount of words every week during the semester.

Procedure
The participants were asked to take the online English Level Test-RLT at the beginning of the class. Forty-six students finished the RLT in a class of fifty-three students. The study began in the middle of September 2017 and was completed in January 2018. Every week, the students were assigned two pages of words in the TOEIC: Vocabulary Express 3000, which included 20-24 target words along with phonetic spelling and example sentences. The students took the VDQ 12 times, in which they were asked to write the words they heard. The first six VDQs were conducted through the traditional approach of TLD, and the rest of the VDQs were conducted through a TTS method in which the students listened to the words twice and an example sentence. For instance, RESERVATION, RESERVATION, I have a RESERVATION for four under the name of Chang. Ten words were selected from 20-24 words on every test. There was a 20-second pause between each word for the students to write the vocabulary word in English along with its Chinese meaning. Finally, ten words were played nonstop at minus one speed from beginning to end. The standard procedure for both TTS and TLD vocabulary dictation tests is shown below, taking the word "promotion" as an example (Figure 6). At the end of the semester, the text-to-speech perception questionnaire (TPQ) was issued to obtain the students' opinions on taking VDQ through traditional TLD and text-to-speech technology (TTS).  *.This is a lower bound of the true significance.

Paired Sample t-Test
There were 42 valid data samples for analysis. A paired sample t-test was used to compute and analyze the data. As presented in Table 3, the mean score for TLD was 57.04, and the mean score for TTS was 65.78. The standard error means were 3.74 and 3.19, respectively. The participants scored higher grades on the TTS vocabulary dictation tests than with traditional TLD methods.  Tables 3 and 4, the t value of the TLD and TTS scores was 2.329 with a standard deviation of 24.31. The mean scores for the TLD and TTS were 57.04 and 65.78, respectively, which shows that the students obtained better scores with TTS than with TLD. Furthermore, the P value = .025 < .05 indicates that the null hypothesis is rejected, and H1 is accepted. Therefore, there is a significant difference between the mean scores with TLD and TTS.

Pearson Correlation Analysis
For research question 2, a Pearson correlation analysis was conducted to determine whether there is a correlation between the scores with TLD and TTS. As shown in Tables 5 and 6, the participants' TLD scores had moderate correlations with the TTS scores (r=.424, p<0.01). This finding indicates that the scores obtained by the participants with TLD were significantly correlated with their scores with TTS. In other words, the participants elt.ccsenet.org Vol. 12, No. 3;2019 who obtained better scores on TLD tests also received better scores on TTS. The findings of the questionnaire are as follows. The questionnaire aimed to investigate the opinions of the participants in applying TTS technology to vocabulary dictation tests in an English class. The students were asked to rate 17 Likert-type questions about their opinions on using TTS technology in vocabulary dictation tests and three other questions about the learning of vocabulary pronunciation. The results are illustrated below in Table 7. The students rated the survey items (questions 1 to 16) from 1-totally disagree to 5-totally agree and from 1-never to 5-always (question 17). The mean scores for each item show varying degrees of agreement. The items asked about the students' opinions of traditional TLD and TTS technology in vocabulary dictation tests and their opinions about pronunciation. The statement about the teacher as the pronunciation role model in the classroom received the highest score. Although TTS technology is sometimes very technical (M=4.14) and does not sound emotional (M=4.05), the students agree that TTS technology is interesting (M=3.95). When comparing the use of TTS technology to traditional TLD in vocabulary dictation tests, the students did not think that TTS technology was more clear than TLD (M=2.64) and did not agree that TTS technology was more understandable (M=2.48). When the students do not know how to pronounce a word, they seem to always check the pronunciation (M=4.21).
The last two items on the questionnaire (Q18 & Q19) were additional multiple-choice questions. Question 18 attempted to determine what the students do when they encounter a word that they do not know how to pronounce. The case summary is shown in Table 8, and the frequency count for question 18 is reported in Table  9.  a. Dichotomy group tabulated at value 1.
As shown in Table 9, when the students encountered unknown words, they tended to check the pronunciation in a mobile-based online dictionary (69.8%) or in a computer-based online dictionary (24.5%). Less than five percent (4.8%) of the students checked the unknown words' pronunciation in a paper-based dictionary, and only a few of them guessed the pronunciation (1.9%).
Question 19 collected information on the methods by which students sought access to improve their vocabulary pronunciation. The case summary is in Table 10, and the frequency count for question 19 is reported in Table 11.  a. Dichotomy group tabulated at value 1.
As shown in Table 11, the top method for the students to improve their vocabulary pronunciation is listening to music (16.9%), followed by seeing movies (12.6%), checking a mobile-based online dictionary (12.6%), and reading aloud with teachers in the classroom (12.1%). Nearly 10% of the students rely on the phonetic alphabet (9.2%) and watching YouTube (9.2%) to improve their English vocabulary pronunciation.

Discussion
One initial objectives of this study was to determine whether there is a significant difference between TTS and TLD performance for vocabulary dictation. It was hypothesized that there was no difference between TTS and TLD in participants' achievement on vocabulary dictation tests. In the review of the literature, little-to-no data were found to connect TTS to foreign language learning, especially in the field of vocabulary learning. The results showed that there was a significant difference between TTS and TLD with regard to the participants' dictation performance. Surprisingly, on the one hand, the participants seemed to strongly agree with the statement "the teacher is the role model for pronunciation in the classroom". On the other hand, they performed better on the TTS dictation tests. This discrepancy could be attributed to the fact that TTS native speakers of English use pronunciations closer to those of standardized English teachers as non-native speakers of English. Another possible explanation might be that the participants listened to the .mp3 tracks attached to the selected book to become familiar with the TTS pronunciations they heard during the dictation tests.
It is interesting to note that the results of this study differ slightly from those of Kazazoğlu's (2013) study, which investigated word recognition between TLD and tape-recorded dictation among 76 intermediate high school students in Turkey. Kazazoğlu's findings indicate that students made fewer word errors in TLD than in tape-recorded dictation. In other words, students seem to perform better in TLD. Kazazoğlu (2013) concluded that speed is the key to the connection with short-term memory, and it is associated with audial competence in language learning. Although Kazazoğlu's results were completely different from the results of this study, he stated that "speed" might be the key point. As technology progresses, the speaker's speed can be tuned in the TTS, which is more flexible than tape-recorded dictation.
The results also showed a correlation between the scores with TTS and TLD. It might be inferred that learners who perform better with TTS would also perform better with TLD. There are two possible explanations for this result. One possible reason is that students who perform well with both TLD and TTS may have better English abilities than others. The other reason is that students who do well with both TTS and TLD may have a strong motivation to learn English; thus, they spend more time preparing for tests.
With regard to the students' opinions about using TTS technology, as mentioned above, the students agreed that teachers are the role model for pronunciation in the classroom. Interestingly, although TTS technology is more native-like than English teachers for Chinese as a native language in English pronunciation, the students seemed to rely on the classroom English teacher as their model for pronunciation. This may be because the TTS technology is sometimes still very mechanical (M=4.14) and does not sound emotional (M=4.05).

Future Studies
In future studies, measurements of learners' perceptions of TTS voice selections and different speeds should be assessed. User experience might also affect the effectiveness of TTS. Additionally, the application of TTS in diverse types of dictation, such as phonemic text dictation or orthographic item dictation, as well as orthographic text dictation in various fields of language learning is worth studying. The length of the text is also worth examining. Finally, including the variable of individual differences in the application of TTS in EFL learning might be an interesting angle worthy of further study.

Limitations
There are a few limitations that are worth noting. First, based on Sawyer and Silver's (1972) dictation categories, the research applied phonemic item dictation in which the teacher or TTS technology read aloud the individual target words for learners to write the words they heard. Although example sentences with target words were used, the dictation method was limited to an "item" only. Second, TTS technology has the function of speed control, from very slow to very fast (-4 to 9). In this study, a speed of minus 2 (speed -2) was used because the slower pace was expected to help learners listen more clearly and to reduce students' rejection of TTS technology.

Conclusion
The study aimed to examine the use of TTS technology and TLD in English vocabulary dictation performance and to reveal students' opinions of the TTS technology compared to TLD. The results showed that there is a significant difference between TTS and TLD in students' achievement on vocabulary dictation tests. Additionally, there is a correlation between the scores with TTS and TLD. Overall, the participants seemed disinclined to agree with TTS technology for learning English vocabulary pronunciation. In other words, they seemed not to appreciate the TTS technology in the language classroom. It seems likely that these results are due to the tradition of teacher-centered stereotypes in foreign language-learning classrooms, in which students rely primarily on language teachers' modeling of pronunciation. It is recommended that instructors implement online TTS appropriately in the language-learning classroom to promote self-learning ability, especially if the students intend to learn to listen and speak in a new language. Finally, although TTS technology is native-like, there is always room for improvement by making it more like a real person's pronunciation. Educators and software designers could consider language-learning objectives with regard to TTS dynamics to make TTS an enjoyable learning environment.