The Use of Keyword Video Captioning on Vocabulary Learning Through Mobile-Assisted Language Learning

Video captioning is a useful tool for language learning. In the literature, video captioning has been investigated by many studies and the results indicated that video captioning may foster vocabulary learning. Most of the previous studies have investigated the effect of full captions on vocabulary learning. One of the key aspects of vocabulary learning is pronunciation. However, the use of mobile devices for teaching pronunciation has not been investigated conclusively. Therefore, this paper attempts to examine the effect of implementing keyword video captioning on L2 pronunciation using mobile devices. Thirty-four Arab EFL university learners participated in this study and were randomly assigned to two groups (key-word captioned video and full captioned video). The study is an experimental one in which preand post-tests were administered to both groups. The results indicated that keyword captioning is a useful mode to improve learner’s pronunciation. The post test results indicate that there was no statistically significant difference between the two modes of captioning on vocabulary learning. However, learners at keyword video captioning performed better that full video captioning.


Introduction
Vocabulary is an essential part of mastering a language.Vocabulary is considered as the building blocks of a language (Schmitt, Schmitt, & Clapham, 2001).Without knowledge of words and their meanings, it is impossible to convey the message in a language.Although listening, speaking, reading and writing are the basic skills that language learners need to master, vocabulary is essential to all these four skills.Therefore, it is important to consider vocabulary learning in any program devoted to language learning.Vocabulary can be presented either traditionally or with the aid of technology.Using technology in language learning is the topic of several studies (for review see Golonka, et al. 2014;Mahdi, 2014).These studies have revealed a number of encouraging results which indicated that language learning with the help of technology can be more effective than through the use of traditional ways.Implementing mobile devices in vocabulary learning has received several studies to examine its effect on vocabulary learning (e.g., Chen, Hsieh, & Kinshuk, 2008, Lu, 2008).The results of these studies revealed that using mobile devices in vocabulary learning is an effective tool.With the help of these technologies, vocabulary can be presented in different modes (e.g., subtitling, annotation, and captioning).Video captioning is implemented to increase vocabulary learning.There are two ways of captioning: full captions and keywords captions.
The impact of video captioning on vocabulary learning has been investigated by many studies (e.g., Aldera & Mohsen, 2013;Hsu, Hwan, Chang, & Chang, 2013;Mohsen, 2016;Stewart & Pertusa, 2004;Sydorenko, 2010).The focus of the previous studies lies on full captions.In addition, almost all of the previous studies were conducted with the help of a computer.However, as far as the author knows, there is only one study that investigated the effect of keyword captions on vocabulary learning (i.e., Yang & Chang, 2013).They proposed three modes of captions: full, keyword-only, and annotated keyword captions and investigates their contribution to the learning of reduced forms and overall listening comprehension.The results revealed that all three groups exhibited improvement while the annotated keyword caption group exhibited the best performance with the highest mean score.However, as far as the author knows, pronunciation has not yet been investigated in mobile-assisted language learning environments.Therefore, this study is an attempt to fill-in this gap and investigates the effect of keyword video captioning on pronunciation in comparison to full video captioning using mobile devices.The study will seek to answer the question: what is the effect of watching video captioning on the development of the students' pronunciations of L2 words?

Vocabulary Learning
There are four basic skills required to master a foreign language (i.e., listening, speaking, reading and writing).However, language learners need enough vocabulary to master each one of these skills.Therefore, vocabulary learning is an essential component of language learning.Wikins (1972) points out that without grammar very little can be conveyed, without vocabulary nothing at all can be conveyed.Vocabulary learning does not just know the meaning of the word.It involves many aspects.Nations (2001, p. 26) lists the aspects that are necessary to know a word.They are a form (spoken, written and word parts); meaning (form and meaning, concept and referents, and associations); and use (grammatical functions, collocations, and constraints of use).
Vocabulary learning can occur in two different ways: intentional and incidental (Nation, 2001).Intentional vocabulary learning refers to any activity aiming at committing lexical information to memory (Robinson, 2001).On the other hand, incidental vocabulary learning refers to learning from context such as from reading or listening.It is a byproduct of something else (Gass & Seinker, 2008).Both ways are required to increase the size of vocabulary that language learners need to communicate in a language.
Moreover, vocabulary learning can occur in two environments (i.e., technology-based and traditional).Technology-based refers to the use of new technologies for vocabulary learning such as TV, computer, personal digital assistants (PDA) and mobile devices.The other environment is the traditional ways such as word cards, dictionaries, and word lists.With the help of technology-based environments, vocabulary can be presented in different modes such as video captioning, subtitling, and annotations.

Video Captions and Vocabulary Learning
Video captioning is one of the modes that technology can provide for vocabulary learning.It is defined by Danan (2004, p. 232) as "on-screen text in a given language combined with a soundtrack in the same language".Language learners have difficulty in decoding the speech of native speakers of the target language.The use of video captioning is an effective tool for language learners to decode the speech of native speakers presented in videos.Captions help learners link the written words to their actual speech.Several studies were conducted to investigate the effects of video captioning on language learning (e.g., Brett, 1995;Garza, 1991;Hsu, 1994;Huang & Eskey, 1999-2000).The results indicated that video captioning increases general comprehension and helps language learners understand better.It provides visual, contextual and non-verbal input for language learners (Brett, 1995).
Regarding vocabulary learning, several studies were conducted to find out whether video captioning is useful for vocabulary learning (e.g., Aldera & Mohsen, 2013;Hsu, Hwan, Chang, & Chang, 2013;Stewart & Pertusa, 2004;Sydorenko, 2010;Yuksel & Tanriverdi, 2009).The results indicated that learners exposed to captioned videos outperformed non-captioned video students.Captioned videos foster vocabulary learning because it may contribute to a conscious focusing on the form and it encourages attention which is an essential for language learning (Vanderplank, 1990).Video captioning is also effective in word recognition and recall (Perez, Noortgate, & Desmet, 2013).From the previous studies, there is a general consensus that implementing captioned videos help language learning and lead to better comprehension and vocabulary learning.

Mobile Devices and Vocabulary Learning
The advent of mobile devices has influenced the lives of millions of people around the globe.Currently, according to 2016 estimates by ITU, the UN specialized agency for information and communication technology (ICT), there are about seven billion mobile phones subscriptions worldwide.Mobile devices can be integrated into education as a tool to facilitate language learning.They are equipped with different input modalities such as MP3 players, YouTube, etc. which can be used to present a target language in an effective way.
To find out the effectiveness of mobile devices on language learning, several studies were conducted (see Burston, 2013 for more information).Burston (2012, p. 16) concluded that "the learning outcomes of MALL implementations are unquestionable positive in nearly 80% of the cases".Mobile devices can be used to facilitate vocabulary learning.Many studies were conducted to examine the benefits of mobile devices on vocabulary learning (Chen, Hsieh, & Kinshuk, 2008;Browne & Gulligan, 2008;Kennedy & Levy, 2008;Lu, 2008;Saran, Seferoglu, & Cagitay, 2012;Stockwell, 2007;Thornton & Houser, 2005;Wong & Looi, 2010;Zhang, Song, & Burston, 2011).The results of these studies revealed that using mobile devices is a useful tool for vocabulary learning.Mobile devices can be useful for learning vocabulary in both ways (i.e., receptive and productive).However, the previous studies focused on teaching vocabulary in a receptive way (Mahdi, 2017).Most of the studies dealt with the "meaning" aspect as a variable to measure the effect of mobile devices on vocabulary learning, but very few studies examined the effect of mobile devices on other aspects (i.e., form and use).

Captioned Videos through Mobile Devices and Vocabulary Learning
Mobile devices have great potential to provide additional activities for language learning.The current types of mobile devices can present the target language in different modes.In this regard, captioned videos can be displayed with the help of mobile devices to foster vocabulary learning.Mobile devices are equipped with many programs that can present the videos clearly and in an interesting way.Researchers have investigated the effects of captioned videos on vocabulary learning with the help of a computer (e.g., Aldera & Mohsen, 2013;Markham, 1999;Stewart & Pertusa, 2004;Sydorenko, 2010;Yuksel & Tanriverda, 2009).These studies have found that captioned videos are useful in fostering vocabulary learning.However, using mobile devices to present captioned videos for vocabulary learning has not fully investigated.Very few studies have dealt with this topic.For example, Hsu et al. (2013) investigated the effects of different display modes on vocabulary learning.They compared the effect of the full caption, target-word caption, and non-caption on vocabulary learning.Students used Personal Digital Assistants (PDAs) to play the videos.Pre-and post-tests were used.Results indicated that both modes (full caption and target-word captions) were good for vocabulary learning.Both groups outperformed non-captioned group.Therefore, this study is significant to find out the effect of mobile devices on vocabulary learning through keyword captioning.This study will attempt to answer the following question: Which mode of captioning is more useful for pronunciation (keyword caption or full word caption)?

Methods
Previous research has shed light on how various modes and modalities may contribute to vocabulary learning (e.g., subtitling, captioning, and annotation).To investigate the effect of captions on L2 pronunciations, two levels of captions were established: 1) full-text captions and 2) keyword captions.The study was designed to explore the following hypothesis: captioning of the authentic video has a positive effect on L2 pronunciation.To examine this hypothesis, an empirical study was conducted based on an experimental design using a quantitative approach.
Two video clips for all input conditions (i.e., full-text captions, and keyword captions) were chosen from YouTube.The captions appeared in the centre of the video clips.In the case of full captions, the captions were divided into phrases and shown simultaneously with the spoken utterances.The keyword captions were set at a default of two seconds, which is enough for the participants to read the word and listen to its pronunciation.The study was carried out in the first semester in the academic year 2017 at University of Bisha, Saudi Arabia.The participants (N = 34) were randomly distributed among no captions, full-text captions, and keyword captions groups.

Participants
The participants in this study were 34 native speakers of Arabic enrolled in the English Department at Bisha University, Saudi Arabia.The students' ages ranged between 19 and 21.Due to university rules and the cultural values that support gender segregation in classes, all of the participants were male students.The participants were randomly assigned to one of the two groups: 17 participants in each group.

Study Design
The study employed a between-subjects design in which participant viewed videos under one of two modes: keyword captions and full captions.The groups were asked to play the videos and then complete the vocabulary test.In other words, the independent variable was the L2 learning mode, which consisted of two modes (i.e., keyword captions and full captions).The dependent variable was pronunciation which was measured by a pronunciation test.Word pronunciation was measured prior to the intervention (i.e., pretest), and immediately after the intervention (i.e., posttest).
A pre-test and a post-test were administered, comprising the same content.The test consisted of 40 words selected from the videos sent to the mobiles of the participants.Participants were required to pronounce the target words.Each word pronounced correctly was given one point.The total number of correct words for each group was computed and analyzed.

Materials Used and Target Vocabulary
The participants of the two groups watched the same video clips and only the modes of captions were different.It was assumed that these clips were appropriate to the participants in terms of both language level and interest.The treatment involved students in the two groups watching the videos three times.The duration of each clip was about 6 minutes which was considered an appropriate time span for learning.Forty target words from the clips were chosen based on the proficiency level and background of the students.The main concern in selecting the target words for this study was that they should be unknown to the students in order to ensure that the effectiveness was the result of the mode (full or keyword captions).These words contain the schwa vowel /ə / which is somewhat difficult for Arab learners of English.According to Kenworthy (1987), the most important sound in English and all learners must be made aware of at a very early stage is the schwa vowel.This sound is not available in Arabic.Therefore, the words of the test include this vowel to examine if the video captioning has an effect on pronouncing words with this vowel.

Procedures
This study consisted of four steps.The steps were as follows.
Step 1: Pre-test: Prior to the treatment, the participants in the two groups completed a pre-test in which 40 polysyllabic words were included.Their responses were recorded in a file to be analyzed later on to see how well they pronounce these words.The participants were allowed to practice the words before recording (basically to overcome the anxiety of pronouncing new words and because of the experience of recording).No correction or feedback in this stage was required.
Step 2: Orientation and demonstration: Before the intervention, participants were shown how to pronounce words which contained the schwa vowel.During the demonstration process, the participants were encouraged to read aloud some words dealt with these issues.
Step 3: Intervention: Participants were divided into two groups, full-captioning, and keyword video caption.Except for different modes of captions, the sequence and the number of the clips were the same to ensure the participants had received an equal input.The videos were sent to the participants' mobile phones via WhatsApp.The participants were asked to watch the video three times.They felt free to replay, rewind, and pause the videos during the engagement.
Step 4: Post-test: After watching the video clips, participants took the same pronunciation test that had already been administered prior to the treatment.They were asked to read aloud the words given in the test and recorded them in a quiet room by using their mobile phones.Every participant read aloud the words and he himself recorded them.The recordings were sent to the instructor via Whatsapp immediately after the treatments.

Data Analysis
To answer the research question, a paired sample t-test was performed using SPSS (v14).The paired sample t-test was employed to compare the pre-test and posttest scores of the two groups.The independent variable is the different groups, including the full caption, and keyword caption groups.The dependent variable is pronunciation.

Results and Discussion
One of the objectives of this study was to examine the effectiveness of the keyword caption mode with respect to the students' pronunciation.It was therefore first necessary to ensure that the two groups had comparable pronunciation level before beginning the treatment.The results were shown in Table 1.This was determined by the way of the pre-test for which the mean values and standard deviations of the scores were 13.94 and 6.28 for the first group (keyword), 14.94 and 8.97 for the second group (full caption).The results of the pre-test among the two groups did not show a significant difference (t = 0.418 >.05); that is, it was ascertained that the two groups of students had equivalent prior knowledge before the experiment.To compare the pre-test and post-test scores of the two groups, the paired-sample t-test was applied.Table 2 shows the results in terms of mean scores (M), the number of participants per group (N), standard deviation (SD), and t value.The results in Table 2 show that mean scores of the full video captioning before the treatment was 14.94 and after the treatment was 18.05.The results of the pre-and post-tests of full video captioning indicated that there was a significant difference (t = -3.79,sig= .002).This indicated that using full video caption is a useful environment for improving l2 pronunciation among EFL learners.
To find out the effect of keyword video captioning on L2 pronunciation, a comparison of the pre-and post-test was conducted.The results in Table 3 show that mean scores of the keyword video captioning before the treatment was 13.94 and after the treatment was 21.35.The results of the pre-and post-tests of keyword video captioning indicated that there was a significant difference (t = -3.12,sig= .002).This indicated that using keyword video caption is a useful tool for improving l2 pronunciation among EFL learners.To explore the effect of video captioning either full or keyword on L2 pronunciation, a comparison of the results of post-test for both groups was conducted.Table 4 shows the results of post-test of both groups.The results indicated that all the participants in the two groups showed significant improvement after the experiment.In particular, the keyword caption group demonstrated most significant improvement in the mean score by 21.35 while the full caption group showed the least increase by 18.05.However, the results of post-test for both groups indicated that there was no statistically significant difference among both groups (t= -.953, sig.355>.05).
Generally, the improvement made on pronunciation made by captioning (either full or keyword) was found to positively impact on L2 pronunciation.Specifically, keyword captions were found to enhance learners' ability to pronounce words better than full captions, potentially resulting from the addition of the written word along with its pronunciation which contributed to students' pronunciation.

Conclusions
The current study explores how different types of captions (i.e., full and keyword captions) impact EFL students' pronunciation.Inspired by past studies, which proposed that word captioning is useful to improving vocabulary learning, keyword captioning was utilized to make EFL learners aware to the pronunciation of English words.
Previous studies have demonstrated a positive outcome when introducing captions or subtitles to enhance vocabulary learning.However, few investigated the effect of keyword captioning on pronunciation in comparison to full captioning.This study revealed that keyword captioning is a useful way to improve learner's pronunciation.The results indicated that learners' pronunciation improved when video captioning is used.However, the results showed no significant difference between the two modes (keyword and full captioning).
The results suggested that keyword captioning had the potential to help EFL learners to improve pronunciation.Though the difference between the two modes was not significant, the study suggested that keyword captions might be more helpful than full captions.Further studies are needed to explore this issue in details.
There are some limitations to the present study.First, given that this study employed a relatively small sample of participants, generalization is unlikely regarding how the two types of captions may be distinguished in effect.For future research, larger sample and a long-term treatment may provide additional evidence and expand understanding in pronunciation.Second, while participants in the study shared a homogeneous academic background in terms of the university they came from and their English proficiency level, and all are male students, for future research, participants could be grouped by alternative variables such as gender, proficiency levels, and so forth before the intervention.The study has found that keyword captions may be beneficial in terms of assisting with the learning of pronunciation.For future pedagogical design, keyword caption is recommended.

Table 1 .
Pre-test of full and keyword captioned words

Table 2 .
Pre-and post-test of full captioned words

Table 3 .
Pre-and post-test of keyword captioned words

Table 4 .
Post-test full and keyword captions