An Acoustic Investigation of Pakistani and American English Vowels

Acoustic analysis tests the hypothesis that the physical properties of Pakistani English (PaKE) vowels differ in terms of acoustic measurements of Native American English speakers. The present paper aims to document the physical behavior of English vowels produced by PaKE learners. The major goal of this paper is to measure the production of sound frequencies coupled with vowel duration. The primary aim of this paper is to explore the different frequencies and duration of the vowels involved in articulation of PaKE. English vowels selected for this purpose are: /æ/, /ɛ/, /ɪ/, /ɒ/ and /ə/. Total ten samplings were obtained from the department of computer science at Sindh Madressatul Islam University, Karachi. The study was based on the analysis of 500 (10×5×10=500) voice samples. Five vowel minimal pairs were selected and written in a carrier phrase [I say CVC now]. Ten speakers (5 male & five female) recorded their 500 voice samples using Praat speech processing tool and a high-quality microphone on laptop in a computer laboratory with no background sound. Three parameters were considered for the analysis of PaKE vowels i.e., duration of five vowels, fundamental frequency (F1 and F2). It was hypothesized that the properties of PaKE vowels are different from that of English native speakers. The hypothesis was accepted since the acoustic measurements of PaKE and English Native American speakers’ physical properties of sounds were discovered differently.


Introduction
The study aims to analyze the sound properties of English vowels produced by English language learners.The present paper examines three basic parameters of human speech, the fundamental frequencies F1 and F2 and the duration of English vowels determined by the data.This experiment is based on 500 voice samples of PaKE learners with the hypothesis that PaKE differs in terms of its acoustic measurements of English vowels as compared to English native speakers' production.The primary aim of this paper is to explore the different frequencies and duration of the vowels involved in articulation of Pakistani English.English vowels were selected as follows: /ae/, / ɛ /, /ɪ/, /ɒ/ and /ə/.The study was based on the analysis of 500 (10×5×10=500) voice samples.Five vowel minimal pairs were selected and written in a carrier phrase.Ten speakers (5 male & 5 female) recorded their 500 voice samples using Praat Speech Processing tool and a high-quality microphone on laptop in a computer laboratory with no background sound.Three parameters were considered for the production analysis of English vowels i.e., duration, fundamental frequency (F1 and F2).Vowel duration is the length of the vowel whereas F1 shows the height of vowel and F2 shows the back-ness of vowel in mouth cavity.difference in pronunciation of native English speakers vs. PaKE speakers.Languages have different vowel systems; therefore, they differ in pronouncing the same words because of different vowel systems available their languages.The current study presents evidence that PaKE is different from native American English in terms of English vowels production.Additionally, another phonetic factor is the mother tongue influence, which is unconscious learning which influences the production of second language speakers.This study investigates the English vowels produced by Pakistani undergraduate students.This paper documents the frequencies and duration of the vowels produced by Pakistani undergraduate students.The purpose of the study was to determine the acoustic differences of English vowels produced by PaKE speakers.Pakistani speakers of English do not perceive the differences of vowels which cause misunderstanding amongst the native Americans.This study will benefit those speakers who make such errors on the production of English vowels.In this context, the study will add some knowledge to the existing literature.Hillenbrand et al. (1995) analyzed American native speech as illustrated in Table 1.Average duration, fundamental frequencies, and formant frequencies of vowels are produced by 45 men, 48 women, and 46 children.Averages are on a subset of the tokens that were well identified by listeners.The duration measurements were analyzed in (ms) all others were in Hz.
Table 1.Average duration, fundamental frequencies & formant frequencies of American vowels Sheikh (2012) finds that PaKE is the closest to British RP English accent as compared to other English accents and further states that the PaKE is now considered an independent substance since it reconstructs its own English sound system.The basic articulator characteristics of F1 and F2 formants are: the displacement of the body of the tongue in the mouth (height and back) and the roundness of the lips (Ladefoged, 1993;Pfitzinger, 2003).First two formant frequencies are enough for the recognition of the speech (Parsons, 1987).For male adult speakers, the range of the frequencies is described for example, "F1 ranges between 200-800 Hz, F2 ranges between 600-2800 Hz, and F3 ranges between 1300-3400 Hz" (Parsons, 1987).The mouth resonance frequency and throat vary due to different sizes of vocal tracts of different people from each other.The vocal tract is approximately 17 cm long for adult male speakers (Parsons, 1987).
The vowel sounds of PaKE i.e., /ae/ was analyzed as an unrounded below the half-open position front, vowel as in 'map' /maep /.Short vowel / e /, is an unrounded front between half-close and half-open vowel.Short vowel /ɪ/ a centralized unrounded half-close front vowel e.g., 'sit' / sɪt /. / ɒ/ is a rounded back vowel just above the open position.e.g., 'rod' /rɒd /. /ʌ/ was analyzed as an unrounded central vowel between open and half-open (Sabir & Al-Saeed, 2014).There are some vowels which are articulated exactly like the English vowels.These are /i/, /ɪ/, /u/, /ʊ/, /ɔ/, / ɒ/ and /ʌ/ in PaKE (Sheikh, 2012).Abbasi & Hussain (2015) argue that 'overall mean of English vowels duration was 75 milliseconds for stressed and 66 milliseconds for unstressed.The mean difference between stressed and unstressed was 40 milliseconds for long vowels and 9 milliseconds for English vowels in Sindhi'.Abbasi (2017) further contends that the mean F1 and F2 values were higher for stressed long vowels and English vowels, while F1 and F2 values were lower for unstressed long and English vowels.Overall average of F1 and F2 values were statistically significant difference within the group and between the groups'.Abbasi & Hussain (2015) note that 'the evidence presented argues for analyzing Indo-Aryan language in which intonation contours appear to be independent of stress.This finding is interesting since it suggests that stress is completely orthogonal to F0 contours unlike in most stress languages in which pitch accents dock on stressed syllables.Sindhi is an Indo-Aryan language whose pitch accent rises from the first syllable in disyllable words, irrespective of syllable weight, and the rise is followed by a fall at end of the word'.In addition, the study concludes that Sindhi behaves like a stress accent language as discussed in Beckman (1986).

Speakers
Participating speakers were recruited from Sindh Madressatul Islam University, Karachi.The speakers were Urdu native speakers; their ages were between 18-25 years old.A total ten of Urdu speakers recorded their voice samples.They were five male and five female speakers.Entire samplings were taken from computer science department from the population of Sindh Madressatul Islam University, Karachi.There was no speech related problem with the subjects as was self-reported.

Recordings
Five hundred voice samples of 10 ESL (English as Second Language) learners were recorded and analyzed on Praat Speech Processing Tool.Three parameters were measured i.e., F1, F2 and the duration of English vowels.For recording process, the drop down, menu 'Record mono Sound' of 'New' option from 'object' window of Praat was used.For this purpose, a list of words was read three times each in a carrier sentence, to get the perfect recording.The recorded file through Praat was used 'save as WAV file' option.Later the Edit & View option was clicked where spectrograph of the token sound appeared.Token sound was selected and went to the menu of first formant frequency where the F1 option pops up which was entered where F1 frequencies appeared.In this way, F2 frequency was also examined.Duration of token vowel was taken manually by selecting horizontal length from right to left or left to right of the token vowel viewing dark bands of the energy carefully.

Procedure of Data Collection
The experiment recorded 500 voice samples of non-nasal /ae/, /e/, /ɪ/, /ɒ/ and /ə/.Monosyllabic word tokens were selected for the recording purpose.The words were CVC, a pattern that precedes vowel and followed by consonant (Consonant-Vowel-Consonant).Fifty words were taken for the sampling process, which contains ten words for each of five vowels.Later two words were removed from the study since the token words did not match the required phonetic parameter.These tokens were selected through the misleading orthography of English.The speakers were recorded in a silent zone with no background noise.They were seated six inches away from a microphone that was head-mounted.They were provided a list of monosyllabic English words that they were asked to read.The list included the CVC pattern which contains all the target English vowels.These words were placed in a sentence.For recording purpose, the carrier sentence 'I say [CVC] now' was used.The carrier sentence allowed the speakers to say carrier phrase with a constant rate.In each recording, the same sentence was repeated by the speaker three times for each English CVC vowel pattern.For such work, to record the possible best production, the speaker was highly encouraged.If there were any possible errors then the whole procedure would be performed again, to record the best sound.The list of CVC pattern can be found in appendix A at the end.The samples were recorded on 'Praat' software.The words were selected mono-syllable because mono-syllable words are easy and understandable to the ESL learners.The words were recorded in isolation in the same position in a sentence to avoid any variations in frequencies.

Data Analysis
For Data Analysis process the option 'View and Edit' is used from object window.The window pops up on a screen.On this, the recorded sound is shown as a triangular wave form.By using 'tab' button (i.e., from key board), it allows the user to play the recorded sound.Once the entire phrase is repeated, then select that part and use short key Ctrl + N to zoom in the selected recording whereas Ctrl + O key is used for zooming out.Repeat the same procedure until the whole three repetitions of the token sounds are done.Praat software figures out the spectrograms of duration and formant frequency etc.

Duration Measurement
All measurements of duration of vowels were taken manually by visual inspection of wideband spectrographic display on computer screen.Start and end points of the target vowels were measured in milliseconds on the spectrographic displays.Both beginning and ending points were measured using the speech processing tool Praat (Boersma & Weenink, 2017).

Formants Measurements
All formant measures for the lowest two formants, F1 and F2, were taken manually based on formant tracks at the visually located mid-point of the target vowel in stressed and unstressed tokens.Whenever a mismatch between the tracks and the visually apparent formant band in the spectrogram was detected, the formants were checked by visual inspection of wideband spectrographic display on computer screen.The pitch contours were manually extracted using the Praat autocorrelation method.Measurements were taken at the visually located midpoint of each target vowel in stressed and unstressed tokens.
Figure 1. 10 graphs of bar charts of F1, F2, F0 and duration of five English vowels across speakers

Duration
The duration of English vowels for both male and female speakers is around 60 to 86, measured in milliseconds.
The averages are illustrated in Tables 3-4.The duration of English vowels, however, shows variations.The duration of English vowels is between 58 to 95 milliseconds as shown in Table 3, for males and 55 to 85 for females as shown in Table 4.As illustrated in Table 1 and Table 3 all durational values of vowels are found to be similar.The durational values of the PaKE vowels / ae/, /ɜ/ and / ə / are found to be approximately the same; about 79 to 86 milliseconds and the vowels / ɪ / and / ɒ / English vowels have too small duration approximately 60 to 67 millisecond vowels.This difference in length introduces a major quantitative difference.This visible difference (shown in Tables 1-3) is clear in vowels /ae/, /ɛ /, /ɪ/, /ɒ/ and /ə/.Another factor affecting the duration of vowels is the deliverance of speech rate.If the speed of production was fast, the time given to the articulation to reach the targets, necessary to correctly transmit the phonemes would be less necessary.Thus, the durational values of vowels and consonants would decrease (but this decrease is more evident for vowels) (Pickett, 1999).

Formant frequencies (F1 and F2)
PaKE vowels measurements are illustrated in Table 3 and 4 from the data analysis for five English vowels /ae/, / ɛ /, /ɪ/, /ɒ/ and /ə/.It is clear from the tables that there is a measurable difference between the frequencies of vowels and between F1 and F2 frequencies.Per observation the F1 is highest for /ae/, /ɛ /, /ɪ/, /ɒ/ and /ə/ lowest for /ɪ/ for all participants.However, the average of frequency F1 of /ə/ is highest, /ɒ/ and /ae/ is higher, high for /ɛ/ and low for /ɪ/ in most of the cases as shown in Table 1 and Table 2.The formant frequencies F2 average of all the English vowels are highest for /ɪ/ and higher for /ae/ and /ɛ/, high for /ə/ and low for /ɒ/.The vowels frequency values are illustrated in Table 1 and Table 2 (See Appendix A for more detail).

Discussion
Majority of PaKE language learners develops their own identity on the production of PaKE vowels.PaKE stands on its own as a non-native standard variety.The English vowel sounds produced by 10 speakers of the sample were measured and found the variations in characteristics of Pakistani English, representing as a PaKE speech.
The current study discussed about the short vowel production by undergraduate students.The study analyzed five English vowels which are given in Table 2. Table 3 shows the average of the observations of the male speaker values and Table 4 shows the average values of the female speakers.The average durational values of each spoken vowel, the frequencies of formants F1 and F2 are shown in the tables for male speakers and female speakers (See Appendix A for more detail).
This present paper measures the variations in PaKE vowels produced by 10 university ESL learners in Pakistan.
The results are based on the evidence analysed and compared.This study was conducted at Sindh Madressatul Islam University, Karachi and collected voice samples from undergraduate students.For this purpose, Praat software was used to analyse the sound recordings.The study discovered variations by the analysis of English vowel frequencies, i.e., F1 and F2 and the durational values of vowel sounds produced by ESL learners of English.Additionally, the study found that five English vowel frequencies of female is higher than males as their pitch was higher.The overall frequency F1 is highest for /ə/ than / ɒ / and /ae/ and lowest for / ɛ / and / ɪ / and for F2 is highest for / ɪ / than / ɛ / and / ae / and lowest for /ə/ and / ɒ / and duration is highest for / ɛ / than / ɒ / and /ae/ and lowest for /ə/ and / ɪ /.It has been noted that F1 for / ɛ / and / ɪ / is almost similar and F2 is almost the same for / ɛ / and /ae/ and duration is approximately the same for / ɛ / and / ɒ /.In most of the words, vowels were not produced as long vocalic sounds as compared to five native English vowels.It has also been noted that English vowels /ae/ sound as 'ah', / ɛ / sound as 'eh', / ɪ / sound as 'ih', / ɒ / sound as 'awe' and /ə/ sound as 'uh'.This study aims to make a significant contribution to the body of work done on five PaKE vowels production by the university students.The research suggests that the teachers should develop awareness about PaKE sounds and try their best to teach their students to improve their pronunciation in English to speak English naturally and fluently (See Appendix A for more detail).

Conclusion
The present study concludes that there is a strong evidence of speech variations between PaKE and American in terms of analysizing three acoustic parameters i.e., F1, F2 formant frequencies and duration of five PaKE vowels between American and PaKE.Speakers tend to move vowels higher and closer to the front positions.Inso far as teaching of English pronunciation is concerned to Pakistani ESL learners context, English language teachers should focus on correct pronunciation of the students because it is their second language they need to focus on English pronunciation in order to improve them.The study discovered different acoustic properties through measurements i.e., formant frequencies and duration across speakers between American and Pakistani English.Thus, this study concludes that there are explicit acoustic variations in American and PaKE vowels frequencies.

Table 2 .
Stimuli for the production of English words by ESL learners

Table 3 .
Average of F1, F2 and duration