Exploring Students ’ Perspectives toward Clarity and Familiarity of Writing Scoring Rubrics : The Case of Saudi EFL Students

The main aim of this research study is to investigate the clarity and familiarity of three scoring rubrics used in a Saudi university’s preparatory year program (PYP) for assessing students’ writing achievement in midterm and final exams. This exploration is important in providing some evidence for the quality of scoring rubrics. To achieve that purpose, a 13-item online questionnaire was used to collect Saudi EFL PYP students’ perspectives toward two quality criteria concerning rubrics; 1) clarity of the information included in the PYP rubrics, and (2) familiarity of the rubrics to students. The subjects were 281 Arabic-speaking male and female EFL Saudi students enrolled in three different academic levels in a Saudi university’s PYP. The results suggest that the quality of the PYP rubrics is insufficient and the criteria set for providing evidence for the rubric qualities were not met. The results show that students tend to have a mild agreement on the clarity of the PYP rubrics, whereas they show a clear disagreement on their familiarity with the rubrics and with why and how the rubrics are used. The study implicates that administrators and teachers need to carefully consider the clarity and familiarity of rubrics in order to justify the decisions made about students’ writing abilities. Rubrics that are unclear or unfamiliar can make students feel confused and frustrated, as they cannot get a clear sense of their writing scores, as well as their strengths and weaknesses.


Introduction
For the sake of validity and appropriateness of scoring rubrics, two crucial quality criteria need to be exist; clarity and familiarity.Clarity refers to that the scoring descriptors included in rubrics are clearly distinguishable and appropriate for identifying different levels of performance (Bachman & Palmer, 1996;Dornisch & McLoughlin, 2006).Familiarity, on the other hand, means that criteria in a rubric relate to what students have been taught as well as that students are familiar with why and how the rubrics are used to score their writing.Thus, a number of scholars and researchers (e.g., Andrade, 2007;Skillings & Ferrell, 2000;Yoshina & Harada, 2007) have proposed that students' perspectives, and not just scorers, should be explored about the clarity of scoring rubrics to help inform students' own learning, as they suggest that this information contributes to students' self-assessment and self-monitoring of achievement.Similarly, other researchers have suggested that it is important for students to have familiarity with scoring rubrics prior to the assessment of their performance (Andersen & Puckett, 2003;Skillings & Ferrell, 2000).Becker (2011) maintains that "with this understanding of what is expected of them and how they will be assessed, students will likely be more accepting of their scores, as a result of the perceived transparency in the assessment process" (p.45).

Research Questions
This study determined the extent to which the clarity and familiarity of three different writing scoring rubrics, used for assessing EFL students' writing at a Saudi university's preparatory year program (PYP), are sufficient through two main research questions.The first research question investigated the evidence for the clarity of the rubrics and included one sub-question: 1) Is there evidence that the PYP scoring rubrics are clear to students?1.1) Is the information included in the PYP scoring rubrics clear to students?
The second research question investigated the evidence for the familiarity of the PYP rubrics to students and also contained one sub-question: 2) Is there evidence that students are familiar with the PYP rubrics and with why and how the rubrics are used?2.1) Are students familiar with the PYP rubrics and with why and how the rubrics are used?

Research Objectives
The purpose of the study is to explore Saudi EFL students' perspectives toward clarity and familiarity of scoring rubrics used for assessing their writing abilities in a PYP.By this exploration, the study aims to provide one piece of evidence for the quality of rubrics in order to provide evidence for the appropriateness and validity of the evaluation inference and decisions made from writing test scores.

Significance of the Study
This study is important in four aspects.First, this study is a contribution to the current literature on argument-based approaches to validation by investigating the types of evidence that are necessary to support the different inferences that link observed performances to the decisions made about students.Second and to the best of the researcher's knowledge, it serves as the first empirical studies to investigate and collect EFL students' perspectives on quality of scoring rubrics used in EFL contexts.Third, the results of this study must be carefully considered if PYP's administrators and teachers are to have confidence with their classroom writing assessments.Finally, it is hoped that the results inspire stakeholders across other academic English and non-English departments in Saudi universities and colleges to investigate and collect evidence for quality of their own scoring rubrics.

Writing Scoring Rubrics: A Literature Survey
Performance-based assessment has been widely used to measure L2 students' writing achievement.Because it aims at making use of tasks that learners are anticipated to encounter in real-life contexts, performance assessment has recently been accepted as the primary means for assessing ESL learners' writing abilities (Cumming, Grant, Mulcahy-Ernt, & Powers, 2005;McNamara, 1997;Weigle, 2002).This type of assessment is preferred because provides evidence for the extent to which L2 learners are mastering the skills believed to be needed for writing in university courses (Becker, 2011).
In spite of the popularity of performance assessment in L2 writing, validation; however, is a controversial issue in performance-based assessment.That is, although performance on writing tasks is typically based on some scoring criteria, scoring rubrics are often neglected or forgotten in the process of collecting evidence for the appropriateness and validity of decisions made from test scores (Cumming et al., 2004;Leung & Lewkowicz, 2006;Sakyi, 2003).This issue is basically due to the belief that evidence indicating the existence of consistency among multiple raters who were able to consistently score the same writing performances was considered sufficient for the use of scoring rubrics (Becker, 2011).However, going beyond the consistency of scoring and collecting evidence that supports the evaluation inference is an essential part of in the validation process of the interpretation and use of performance-assessment scores (e.g., Kane, Crooks, & Cohen, 1999).
Hence, involving scoring rubrics in this validation process is crucial, because decisions will likely be less valid if the scoring rubrics used to measure written performance are not adequately constructed and appropriately used.,The mainstream empirical research on rubrics; however, have focused largely on type and focus of performance task, type of rubrics used, measures of reliability and validity, measures of impact on students' learning, and students' and teachers' attitudes towards using rubrics as an assessment tool (Jonsson & Svingby, 2007).Becker's (2011) study, therefore, came to fill in this gap and was the first study to investigate, among many other things, the clarity and familiarity of scoring rubrics to students by exploring students' perspectives.Using of survey questionnaires, he surveyed a total of 200 ESL students enrolled in four IEPs in the USA.His participants included both males and females between the ages of 17 and 40 years old (M = 23.81years; SD = 4.12), representing 14 L1 language backgrounds.They represented different levels of English language proficiency, ranging from high-beginning to high-intermediate.His results revealed that students' perspectives in programs B and D indicate that their IEPs rubrics are both clear and familiar.Program A's students believe that their IEP rubrics are clear but not familiar, meanwhile program C's students perceive their IEP rubric as familiar but not clear.
Motivated by Becker's (2011) results, this study purports to extend the current literature in this research area and to examine rubrics in the EFL context.Thus and given the importance of rubrics in performance assessment, examining students' perspectives towards clarity of familiarity of rubrics in different contexts seems to be a research topic worthy of further exploration.

Data Collection
Data was collected using an online survey questionnaire (see Appendix A) adapted and modified from Becker (2011) to explore subjects' perspectives toward clarity and familiarity of three scoring rubrics used for assessing students' writing abilities in midterm and final exams.In this study, Item 5 from Becker's original questionnaire was thought better to be moved from the clarity section and included in the familiarity section; thus, the subset of questions of the former has decreased from six to five questions, while the subset of questions of the latter has increased from five to six questions.The survey was created in Google Forms and distributed to the students via emails, and the participants voluntarily responded to the survey.It consisted of 13 items: two multiple-choice items and 11 Likert-scale items.The scale for the Likert-scale items ranged from 1-5 (i.e., Strongly Disagree = 1; Disagree = 2; Neutral = 3; Agree = 4; Strongly Agree = 5).As shown in Appendix A, two items (items 1 and 2) were designed to collect information about demographics, five items (items 3-7) about whether the information included in the PYP rubrics are clear to students, six (8-13) items about whether students are familiarity with the PYP rubrics and how well they understood them (i.e., how well they understand why and how the rubrics are used).
The population for this study comprised Saudi EFL learners enrolled in a university's PYP in the Kingdom of Saudi Arabia.The PYP is a mandatory one-year prerequisite for students looking to pursue an undergraduate degree from the university's various science and health colleges, including Medicine, Dentistry, Applied Medical Sciences, Pharmacy, Nursing, Engineering, Architecture and Design, and Computer Science.The objective of the PYP is to prepare students for the English medium teaching in those collages where the subjects are taught in English.Besides teaching the English language, PYP's students are taught mathematics, science, and computer using English also as the medium of teaching.This study was conducted during the middle of the spring semester.At that point in the semester, students had reasonable familiarity with the PYP, its instructional system, the writing course, the writing teaching methodologies, and scoring rubrics.The PYP' population comprised 2500 Saudi and non-Saudi students enrolled in three different PYP academic levels; level one, level two (health track), and level two (scientific track).
The sample for the study included 281 Arabic-speaking male (N = 201) and female (N = 80) EFL Saudi students whose ages range from 18 to 22 years (Mdn = 19).They were enrolled in the PYP's three levels as follows; level one (N = 104), level two (health track, N = 113), and level two (scientific track, N = 64).Their English language proficiency levels vary from beginners to high intermediates, and they have been learning English for a minimum of seven years in the general education system and few had the opportunity to learn English in English-speaking countries for a short period of time.

Materials
Three different scoring rubrics used for assessing PYP's students' writing achievement in midterm and final summative exams were utilized to collect students' perspectives toward the clarity of familiarity of these rubrics to them.The rubrics are designed by the PYP itself where the study was conducted.In detail, these rubrics are designed to score level-one final exam (Appendix B), level-two mid-term exam (Appendix C), and level-two final exam Rubric (Appendix D).

Data Analysis
First, in order to provide some answers to the two main research questions, the data will be analyzed following Becker's (2011) criteria regarding providing evidence for the clarity and familiarity of scoring rubrics.That is, the first subset of 5 items (i.e., 3-7) from the questionnaire was considered for analysis.These 6 items targeted the students' perspectives about the clarity of the information included in the PYP rubrics.Each questionnaire item was scored from 1-5, for a total possible score of 25 points for the 5 items (i.e., 5 items × 5 possible points per item).In order to provide evidence for the clarity of the rubrics, at least 80% of the sample of students for the subset of items (3-7) needed to be greater than 20 (4.0 × 5 items).A score of 20 would indicate that, on average, students agreed that the criteria in the scoring rubric were appropriate, clear, and useful.The internal consistency reliability (Cronbach's = .84)was considered good for the subset of items.
Similarly, the second subset of 6 items (i.e., 8-13), which targeted students' perspectives about their familiarity with the rubrics, was then analyzed.The total possible score for the 6 items was 30 points (i.e., 6 items × 5 possible points).Again, at least 80% of the sample of students needed to be greater than 24 (4.0 × 6 items).A score of 24 would indicate that, on average, the rubrics are familiar to students.The data collected was processed by using Statistical Package for Social Science (SPSS) program.The internal consistency reliability (Cronbach's = .82)was considered good.
Second, in order to provide some answers to the two sub research questions and to assess students' perspectives regarding the clarity and familiarity of scoring rubrics, descriptive statistics summarizing the survey items were computed and presented on individual items and also on scale dimensions.These included the following: means, standard deviations, frequencies, and response percentages.The Likert scale defines the perspectives that fall between 1.0 and 2.9 as disagreement, 3.0 and 3.7 as neutral, and 3.8 and 5.0 as agreement.

The Clarity of the PYP Rubrics
As mentioned earlier, to provide evidence for the clarity of the rubrics (question one), at least 80% of students needed to have a total score of at least 20 for the subset of items (3-7) in the questionnaire.However, the data from the questionnaire indicated that only 60/281 (21.3%) of students had a score of at least 20 or higher for the subset of questionnaire items which did not provide evidence for the clarity of the scoring rubrics.Looking at the data in more detail (see Table 1), the strongest agreement occurred for questionnaire item 4, in which approximately 55% of students confirmed that the rubrics help them to see what they are good at and what they need to do better.In contrast, the strongest disagreement occurred for questionnaire item 1, in which approximately 41% of students confirmed that the information from the rubrics about their writing are not clear to them.Besides the survey data failed to provide positive evidence for clarity of the rubrics, the mean across all 5 items is 3.13 (SD = .96),indicating that students as a group don't have a strong agreement on the clarity of the rubrics (i.e., given that an answer of 4 on every item indicates agree; sub question one).Although the most frequently provided answer across all scale items was agree, amounting to 571 of the 1405 (40.64%) responses as can be seen in Table 1, only nearly fifty percent of the participants agreed that the PYP scoring rubrics are clear and useful.Also, across the 5 items, no agreement by more than 55.5% has occurred for any item.

Students' Familiarity with the PYP Rubrics
As discussed earlier, to provide evidence for the students' familiarity with the PYP rubrics (question two), at least 80% of students needed to have a total score of at least 24 for the subset of items (8-13) in the questionnaire.Similar to the results for the clarity of the rubrics, the data from the questionnaire indicated that only 30/281 (10.6%) of students had a score of at least 24 which did not provide evidence for the students' familiarity with the PYP rubrics.The strongest agreement occurred for questionnaire item 6 in which approximately 45% of students confirmed that the PYP rubrics contained information about what they learned in class.Meanwhile, approximately 52% of the students reported the strongest disagreement with questionnaire item 10, which targeted whether the PYP teachers showed them how to use the writing scoring rubrics.On top of that the survey data failed to provide positive evidence for familiarity of the PYP rubrics, the mean across all 6 items is 2.83 (SD = .91),indicating that students as a group have a clear disagreement on their familiarity with the PYP rubrics (sub question two).Although the most frequently provided answer across all scale items was agree, amounting to 497 of the 1686 (29.47%) responses as can be seen in Table 2, only 36.7% of the participants agreed that they are familiar with the PYP rubrics and they understand how and why they are used whereas approximately 41.6% of the participants disagreed in 703 of the 1686 responses.Also, across the 6 items, no agreement by more than 44.5% has occurred for any item.Moreover, both item 8 "I know how the PYP writing scoring rubrics are used to score my writing" and item 10 "My PYP teacher showed me how to use the writing scoring rubrics" have disagreement by approximately 50% or more.

Discussion
As a result of using Becker's (2011) evaluation framework, the overall quality of the writing scoring rubrics at the PYP appeared to be insufficient and the scoring procedures could have been improved.Based on students' perspectives, the PYP rubrics appear to be unclear and students are not familiar with them as no evidence that support those two quality criteria was observed.This rings alarm bells that the PYP was relatively unsuccessful designing their rubrics compared to the four IEPs in Becker's study in which rubrics in two programs were both clear and familiar and the rubrics in the other two pragmas were either clear or familiar.
The low quality of PYP rubrics is due to the fact that they include scoring criteria that were too general for evaluating writing.They do not have an enough list of scoring criteria that were considered important for writing; instead they incorporated one-word lists of scoring criteria (e.g., format, introduction, organization, and content).
In addition, instead of having scoring bands, they incorporated only the highest possible score for each criteria, which is against the theorists and researchers' preference (e.g., Arter & McTighe, 2001;Hamp-Lyons, 2003;Myford, 2002;Stevens & Levi, 2005) that rubrics should consist of three to seven scoring bands for the purpose of adequate distinction between levels of performance.Thus, the PYP rubrics appeared to be unable to distinguish students' different levels of writing performance or placing their writing along a continuum of performance.Therefore, the evaluation inference and decisions made from test scores cannot be adequately justified.

Conclusion and Implications
As performance assessment is becoming the chief means for assessing L2 learners' writing abilities and scoring rubrics are becoming more popular in that type of assessment, their quality must be carefully considered, in order to justify the decisions that are being made based on their implementation.This study was an examination of the quality of the rubrics used to score performance-based writing assessments in midterm and final exams in a Saudi university' PYP.Students' perspectives generally demonstrate that the quality of the PYP rubrics is questionable.
To have a complete validation process of the rubrics, however; PYP's administrators as well as teachers need to be involved in the study research in addition to students such as Becker's (2011) work, as each can contribute unique insights and can provide useful information that informs innovation and change (Popham, 2003).
The findings emphasize the need for designing scoring rubrics that students can easily interpret, in order to have successful teaching and learning.Administrators and teachers need to carefully consider the clarity and familiarity of rubrics because unclear or unfamiliar rubrics are difficult to interpret and cause confusion and frustration to students.They also need to make sure that students can have easy access to scoring rubrics and understand how raters use them to derive scores.Thus, it is necessary that students have some basic information about writing assessment and the scoring procedures associated with it, in order to make students have confident in the decisions that are being made and to help have some power in their own learning (Spolsky & Hult, 2008).Becker (2011) suggests "this can be accomplished by familiarizing students with scoring rubrics to evaluate their own, or their peers', in-class writing" (p.270).Finally, one effective way to make students familiar with rubrics is to invite them to participate in co-creating a scoring rubric or make them use a scoring rubric in assessing writing.Interestingly, studies that have done like that (e.g., Andrade, Du, & Wang, 2008, 2010;Becker, 2016, Sundeen, 2014) have observed even greater improvement in writing performance for those students who worked more closely with scoring rubrics.

Table 1 .
Response frequencies and descriptive statistics for the clarity of the PYP rubrics

Table 2 .
Response frequencies and descriptive statistics for the students' familiarity with the PYP rubrics