Frequency of Using Najdi Arabic Words Among Saudi College Male Students

The study of dialects may be subsumed under the very broad rubric of colloquialism which comes at the bottom of the formality versus informality scale. We focus on the Najd dialect perception, as the central dialect in Saudi Arabia, among Saudi male college students. By conducting two experiments, questionnaires and follow-up semi-structured interviews, on 137 male students, user-based frequencies of the topper 50 Najdi words are generated. The second phase aims at semantically categorizing the topper content words so that conclusions can be drawn about the inclination of using Najdi words among the college students. Results show that the categorization of the retrieved 50 Najdi words, according to the part of speech, demonstrates that the most applauded Najdi Arabic words are verbs and adjectives. Synonyms are even retrievable from this method of compilation. Nouns are the most resistant part of speech at the morphological level.


Problem Statement
The poor resourcing of Arabic language and its dialects, unavailability of competent morphological analyzers and syntactic parsers, specificity of the informal discourse and user-generated content, writing inconsistency of users and the dynamic sociolinguistic shifts are obstacles that impede any synchronic understanding of language dynamics and change.This seems to be true especially for the Najdi dialect; compared to the resources and tools applicable on the standard language variety and to the plethora of the enhanced tools for the Indo-European languages.This study, therefore, addresses such a challenge by introducing a dialectal frequency-based lexicon of Najdi words and their English translation.To validate our proposed lexicon, we investigate the familiarity of 137 Saudi college male students with the collected set of words.This effort aims to answer two questions.First, what are the most frequent Najdi content words according to the Saudi male college students?Second, how accurate is the automatic detection of dialectal vocabulary and of their frequency?

Review of Literature
The psycholinguistic profiling of Najdi dialect is scarcely addressed.Ismail (2017) studied the translation challenges of rendering Najdi proverbs into English.He concluded that the process of translating Najd dialect necessitates understanding the external real world in which a dialectal word lives and articulates because dialect words, and their senses, derive meanings from their cultural realities and from their historical context.Alatawi (2015) analyzed the structural, socio-pragmatic, and psycholinguistic analysis of code switching in Arabic TV programs either monolingually among the major dialects (Najdi, Hejazi, Janubi, Shamali & Shargiah) or bilingually between Arabic and English.Ingham (1994) introduced a bilingual Najdi Arabic-English lexicon.However, his study lacked rigorous criteria of inclusion.Therefore, the specificity of the generated wordlist, and lemmas, was not acceptable.This highlights the significance of studying the Najdi dialect as objectively as possible to define the most frequent words and what they may tell about the original culture.In a similar vessel, lexicographic effort has been recently directed toward compiling frequency-based dictionaries for core Arabic vocabulary (Buckwalter & Parkinson, 2014) and even for word sketches, collocates and thematic lists of American English (Davies & Gardner, 2013).
Recently, StimulStat, a frequency-based psycholinguistic resource for Russian, is proposed to include approximately 1.7 million word-forms, lemmatized into 52,000 lemmas.Frequency, length, uniqueness point, orthographic, phonological representations (stress, syllabic structure, phonemic transcription), grammatical features, semantic relations (homonymy, polysemy, synonymy), typical age of acquisition and imageability are all covered.StimulStat was built on three printed dictionaries, of which was a frequency lexicon, and a national corpus (Alexeeva et al., 2018).

Theoretical Background
Other than classical Arabic and modern standard Arabic, there are several dialect groups.The major dialectal groups are the Egyptian, Arabian Peninsula, Maghrebi, Sudanese, Mesopotamian, Levantine, Andalusian dialectal groups.Dialect expressions are informal, temporal, and oral.They are also dynamic, novel, often colorful, and humorous and aim either to establish a social identity for the speaker or to make strong impression upon the hearer (Mattiello, 2009).It is thus intelligible and recognizable for the people who share the geographical borders to speak the same dialect as a means for reflecting the identity of the speakers.Considered ambiguous, figurative and specific, the connotation of dialectal expressions is comprehensible for the people who use such expressions in their day-to-day life.Thus, exclusivity is a major feature of dialectal expressions.Playfulness is a common feature that designates the dialectal expressions.In many cases, the dialectal expressions are figurative, as they use metonymy and pun in communicating their intended messages.Dialectal expressions are creative and innovative.They are fallen under the process of permanent reproduction and reconstruction.Spatial and socioeconomic realities stain Arabic dialectal expressions with totally different connotations from the modern standard Arabic (Holes, 2018).The culture-bound elements refer to items, tools, instruments, proper names, food items related to specific culture and specific time.The problem of translating culture bound elements is threshold (Mattiello, 2009).
Najdi dialect is an oral form of speech, which is transferred across generations through imitation and repetition.It is transferred through folklore and oral forms of poetry, like Nabati poetry.In addition, the names of traditional industries, artifacts, crafts, and cultural heritage are realized in dialects.The formation of dialectal words and expression is time bound as well as cultural bound.That is to say, dialect words rapidly change over time because they derive their meanings and understanding from their surrounding world (Holes, 2018).Since Najd region hosted the capital city of Saudi Arabia, Riyadh, there has been a clear shift in the socio-economic realities of Saudi community.The nomadic communities had been replaced by stable communities living in well-organized and modern cities.That is to say, urbanization prevailed over Najd region.The Saudi government began establishing very modern system of education as modern schools and universities have been built in Saudi Arabia.Multiple dialect words are borrowed from Pakistani, Indian, Bengali and English.Today, Najd region is no longer such an isolated desert land.On the contrary, it has become an international city that attracts people from all over the world.This whole issue has clearly been reflected in being the central dialect in Saudi Arabia.Some dialect expressions are not clear and incomprehensible, which requires interpreting and understanding them in MSA.Given that the intra-lingual concept of translation is no longer valid for translating dialects, because translators fail to retrieve synonyms for the dialectal terms either in the classical Arabic language or in MSA, lexicographic effort is ushered towards computationally generating and compiling frequency-based specific language resources.

Participant Characteristics
The participants were voluntarily recruited from Prince Sattam Bin Abdulaziz University at their sophomoric year.They were all male native speakers of Arabic who studied language and translation.The participants were asked to report any tribal affiliation if indicated.The mean age of the students was 19 years (range: 18 to 20).A link to the online survey, using Surveymonkey, was shared.Those who did not wish to fill in the electronic form were given printed versions of the questionnaire.After responding to the questionnaire, the volunteering students were interviewed.In the interviews, questions were structured to measure the familiarity of the students with the context of the Najdi Arabic words in which they normally live.Put simply, participants were asked to suggest any sentence they usually hear, which contains the investigated Najdi words.A total of 137 responses were collected.

Study Material
We extracted wordlists from two corpora.The first one, TenTen15, was a ready-made searchable corpus, retrieved from the web, and did include both modern standard Arabic and several dialects.The second corpus was generated from online tweets published by Saudi citizens using API bootstrapping.Wordlists were computationally retrieved and sorted on frequency.A list of the 50 most frequent Najdi words were exported into our questionnaire and were sorted on the ranking of the corpus-driven findings.Both electronic and printed forms of the questionnaire were produced to match the preferences of the respondents.

Study Procedures
This study conducted three experiments.First, we computationally compiled most frequent Najdi content words from general and specific contemporary corpora.The frequency of occurrence of each lemma in the corpus and concordance were also retrieved.Second, a questionnaire, which was composed of a 5-Likert-scale question about the frequency of every Najdi word, was designed and tested on 137 male students.All students were sophomoric Saudi college students.Third, interviews were conducted to investigate the familiarity of the respondents with the studied Najdi words.Statistical t-test was calculated to compare the human-based evaluation to the automatic computational compilation.

Results
In this section, we describe the results of the three experiments we have conducted.For the first experiment, the list of retrieved most frequent Najdi content words and their frequencies are provided.The functional words and non-words were excluded.Table 1 displays the frequency of the usage of the 50 topper Najdi Arabic words (NAW) among the studied population.Translation was rendered according to the concordance of each NAW.The Likert scale, used for respondents to evaluate the frequency of using each NAW, was sorted from the most frequent (daily heard NAW) to the least frequent (rarely heard).Respondents demonstrated familiarity with all the studied NAWs.The categorization of the studied 50 Najdi words, according to the part of speech, demonstrates that the most applauded NAWs are verbs and adjectives.Synonyms are even retrievable from this method of compilation (e.g., ‫ھبيلة‬ and ‫.)سبھه‬ Nouns are the least changing form morphologically.However, unpaired t-test and ANOVA test revealed a statistical significance between the frequencies retrieved computationally and the frequencies scored by the students.Post-hoc Tukey test showed that verbs and adjectives were of greater familiarity to respondents than of what numbers predicted.

Discussion
Dialect expressions are often informal, temporal, and dynamic.They are used for establishing a social identity or for making strong impression upon the hearer (Mattiello, 2009).Najdi dialect is an oral form of speech, which is transferred across generations through imitation and repetition.It is transferred through folklore and oral forms of poetry, like Nabati poetry.In addition, the names of traditional industries, artifacts, crafts, and cultural heritage are realized in dialects.Notwithstanding, some dialect expressions are not clear and incomprehensible, which requires interpreting and understanding them in MSA.Psycholinguistic lexicographic effort is ushered towards computationally generating and compiling frequency-based specific lexicon for addressing the most frequent Najdi dialect expressions.
In our proposed lexicon, we excluded the functional words because they are systematically used in all languages.Therefore, the incidence of its use is not of paramount importance, unlike the content words.The majority of content words denote thematically motion verbs, size description, emotion regulation and culture-specific designations.The results retrieved from Hanson et al. (2001) go semantically hand in hand with our results.Such result reported a greater predilection of frequently using words which express activities, persons (patients and agents) and the products of human activities.
To the best of our knowledge, this is the first study which aims at compiling a frequency-based Najdi Arabic-English lexicon.For the accuracy of NAW detection, the computational retrieval was successful at retrieving the topper NAWs.However, this technique was more accurate in retrieving the frequency of nouns and adverbs than in measuring the frequency of verbs and adjectives.However, computational bootstrapping seems more effective in generating a customized search because the top-down approach could not retrieve as much words as required.We therefore recommend integrating both techniques for compiling dialectal lexica.

Conclusion
Frequency-based dialectal lexical are useful for building up language resources.The specificity and sensitivity of the retrieved lexica depends greatly on the adopted method of compilation.We recommend the interoperability between automatic detection of the seeding words and the psycholinguistic experimentation and validation of the generated vocabulary.We also recommend the social media streams as a rich medium for generating colloquial dictionaries.Contribution of similar studies to machine translation and to building language resources would be made.This lexicon can be integrated among other dialectal varieties of Arabic to enrich such a poorly resourced language.

Table 1 .
Frequency of the usage of the 50 topper Najdi Arabic words (NAW) among the studied population