Holistic versus Analytic Evaluation of EFL Writing : A Case Study

This paper investigates the performance of holistic and analytic scoring rubrics in the context of EFL writing. Specifically, the paper compares EFL students’ scores on a writing task using holistic and analytic scoring rubrics. The data for the study was collected from 30 participants attending an English undergraduate program in a Yemeni university. The authors used psychometric statistics (Inter-rater Agreement, Intra-Class Correlation, t-test and ANOVA) to compare the performance of the students on the two rubrics in accurately diagnosing students’ strengths and weaknesses and placing them along a continuum of foreign language writing proficiency. The raters of the writing samples included three experienced instructors working at the same department. The results of correlating the students and raters’ holistic and analytic scores and of examining the variations among the correlations provide evidence for the reliability and validity of both rubrics. Analytic scoring rubrics, however, placed the examinees along a more clearly defined scale of writing ability, and are, therefore, more reliable than holistic scoring rubric instruments for evaluating EFL writing for achievement purposes than holistic scoring rubric.


Introduction
'Good writing' is a growing pedagogic demand.In educational settings, writing is the basis upon which a candidate's achievement, learning and intelligence are judged.Good writing skills are critical to academic and professional success-they can lead to good grades, admission into college, exit from college, a good job, and upward professional movement.
Consequently, and expectedly, testing for writing ability is becoming a very pressing demand.The purposes for which the writing ability is tested include, but are not limited to, awarding grades, certifying proficiency, testing suitability for a particular profession, placing candidates in the appropriate component of a language program, and allowing candidates to exit programs.While the stakes are not high for some of these purposes, they are very high for others-they have important consequences that significantly impact the test taker's life.
To test for writing ability is to define 'good writing'.The measurement of the writing ability is impacted by four factors, namely, the student, the scoring method, the test administration, and the test itself (Mousavi, 2002).While all other three factors are equally significant, the most relevant to the concerns of the present paper is the scoring method-the method selected by the rater to pass judgments about the writing ability.There have been numerous attempts in the literature to introduce methods of scoring (e.g., Hamp-Lyons, 1991;Shohamy, 1995) and many other attempts improve the accuracy and consistency of these methods (McNamara, 1996;Brown, 1996;Wiseman, 2012).The decisions about writing competence that are derived from one scoring method do not always, and do not necessarily, comply with decisions from another scoring method.These scoring systems are very important because they are used to classify test takers and, accordingly, make high-stakes decisions that define the course of their lives.
Moving down to our research context, the success of undergraduate students of English at Taiz University in Yemen is also largely dependent on their ability to write.The program is eight-semester long, comprising 52 courses among which 45 are on English language and literature.Each of the 45 English courses involves a mid-term test and a final exam-a total of 90 achievement tests overall.Passing these achievement tests and Informal interviews with the teaching staff of the department and an examination of a random sample of mid-term and final exam answer-books suggest that a general-impression marking scheme is the norm.The criteria of evaluation, according to the teacher-raters, are content relevance, content coverage, and language.Test takers who address all the points adequately are rewarded and those who do not are penalized.There are no clear descriptors, however, for awarding marks to intermediate levels of writing proficiency, other than a general impression.The descriptors are not explicitly stated; they are neither clear to the teacher-rater, nor are they known to the test takers.The result is impressionistic judgments of writing proficiency that depend more upon the rater than upon text qualities, and that fail to make valid distinctions between test takers across a continuum of writing proficiency.
In light of these considerations, it becomes of paramount importance to improve consistency across evaluator's judgments about writing proficiency and to improve the reliability and validity of these judgments in order to avoid bias and produce greater agreement between raters about test taker's achievement.An important move towards achieving this objective is using scoring rubrics.Different kinds of rubrics have been in use since, at least, the 1960s and have received much scholarly attention.This paper will focus on the major types of rubrics to measure writing proficiency, considers the uses of each scoring rubric, and outlines the theoretical and practical advantages and disadvantages of each.

Literature Review
A rubric is a tool for evaluating the quality of student work on a continuum of performances from excellent to poor (Schafer, 2004).It contains a set of well-established criteria corresponding to a scale of possible points to be assigned in scoring a piece of work, spoken or written (Campbell, Melenyzer, Nettles, & Wyman, 2000).The best performance is assigned the highest point and the worst the lowest point on the scale.A scoring rubric provides descriptors for the different levels of proficiency on the scale.These descriptors are detailed enough to enable sufficiently fine judgments, and rich enough to enable reliable, unbiased and valid discrimination.Herman, Aschbacher, and Winters (1992) posited four characteristic features of a rubric-criteria, standards, scale, and examples.An effective rubric has a well-defined list of criteria for the test-takers to know what is expected of them and for the raters to be able to properly evaluate the responses.Second, an effective rubric contains standards of excellence for the different levels of performance.Third, an effective rubric has gradations of quality, or a scale, based on the degree to which the standard has been met.The gradations are constituted by detailed descriptions that represent what should earn which point on the scale.Last, but not least, an effective rubric contains modal exemplars of expected performance at the different levels on the scale.
Another important characteristic of a rubric-one that is well attested in the literature though not mentioned in the previous list, is reliability.An effective rubric is one that is used by different raters on a given assessment task and generates similar judgments/scores.Consistency across raters' judgment about the relative standing of performance ratings is referred to as "inter-rater reliability", and the frequency of two or more raters assigning the exact same rating to a particular performance is known as "inter-rater agreement".While these two forms of reliability estimators are frequently employed in research contexts, inter-rater agreement is more relevant to the present research context where decisions about passing exams, exit programs, and even about tenure are made based on a score threshold.In Yemen, for example, a student receiving 47 marks fails the test whereas a student receiving 48 is pushed to the cut-off score and passes the test.Weigle (2002) argued that there are three types of rubrics used in the evaluation of written proficiency.These are primary trait, holistic and analytic scoring rubrics.These three types differ in their impact, discriminatory power, inter-rater reliability, the degree of bias, and the cost-effectiveness-in terms of time, effort and money (Kuo, 2007).The choice of one scoring rubric or the other is significant because if "represents, implicitly or explicitly, the theoretical basis upon which [a] test is founded" (Weigle, 2002, p. 109).The relevance of this paper, some studies that have used holistic scoring rubrics, analytic scoring rubrics, and studies that have compared both types of rubrics are reviewed below.
Holistic scoring is "a global approach" to scoring that is underscored by the idea that "writing is a single entity which is best captured by a single scale that integrates the inherent qualities of the writing" (Wiseman, 2012, p. 59).As such, holistic scoring considers the entire written response and assigns an overall score to the performance (White, 1985;Weigle, 2002;Hyland, 2002).This cost-effectiveness of holistic scoring makes it a suitable approach for large-scale assessment of written performance, especially for decisions concerning placement (Cumming, 1990;Hamp-Lyons, 1990;Reid, 1993).
Holistic scoring criteria consist of general guidelines that define good performance at each score point.This has prompted a number of researchers (e.g., White, 1985;Cohen, 1994) to argue that holistic scoring focuses on the strengths of the writing rather than on the deficiencies.The holistic rubric generates a composite score that "does not provide specific evidence of where and how much additional writing instruction is needed" (Becker, 2011, p. 116).Despite this shortcoming, if indeed it is, Weigle (2002) argues that holistic scoring rubrics are very practical.They are short, do not include detailed criteria of evaluation, and make possible the evaluation of an essay by assigning one score to it after only one reading-thus serving the economic interests of university departments and employers.Holistic rubrics are therefore typical for evaluating written performance in large-scale assessment contexts.This has made holistic scoring the method of assessing written performance in the computer-based Test of English as a Foreign Language (TOEFL), Graduate Record Examination (GRE), and Graduate Management Admission Test (GMAT).Diederich (1964) was one of the earliest studies to make use of holistic scoring rubrics in such large-scale testing situations.Three hundred written performances were evaluated by fifty-three raters, and the study concluded that the variation is the ratings is mostly attributable to three criteria-ideas, language and organization.Twenty years later, Breland and Jones (1984) analyzed eight hundred written samples and also attributed the variations of raters to ideas, organization, and use of supporting materials.Successive other studies have examined the issues of the validity of holistic scoring (Charney 1984), inter-rater reliability (Stach, 1987;Erickson, 2001), the consistency of agreement among raters (Huot,1990;Legg, 1998), the importance of rater training for achieving internal consistency and normative rating behavior (Kondo-Brown, 2002;Kim, 2010), the difference in the ratings of native and non-native English speaking raters in China (Shi, 2001), and alternative methods of evaluating writing performance (Reid, 1993).
As an alternative, analytic scoring, which involves "the separation of the various features of a composition into components for scoring purposes", has also received considerable scholarly attention (Wiseman, 2012, p. 60).An analytic scoring rubric typically includes writing components relating to the test taker's lexical, syntactic, discourse, and rhetorical competence.As such, an analytic scoring rubric offers more detailed information about a test taker's writing performance than does the single score of a holistic scoring rubric.An analytic rubric provides orderly and comprehensive feedback to teachers and assists them in the discrimination of the weak and strong aspect in students writing performance (Hamp-Lyons, 1995;Crehan, 1997).In other words, an analytic rubric has higher discriminating power (Mendelsohn & Cumming, 1987).
The first analytic scoring rubric was the ESL Composition Profile (Jacobs, Zingraf, Wormuth, Hartfiel, & Hughey, 1981).It was used to measure the writing performance of ESL students at North American universities and consisted of five different rating dimensions of writing quality, each having a different weight: content (30 points), organization (20 points), vocabulary (20 points), language use (25 points), and mechanics (5 points).Other well-known examples of analytic scales are the Test in English for Educational Purposes (TEEP; Weir, 1990) and the Michigan Writing Assessment Scoring Guide (Hamp-Lyons, 1991).But of all the existing rating scales for examining written performance (see Shohamy, 1995), the present study adopts, indeed adapts, Bachman and Pamer's (1996) model of communicative language ability and the rubric based on the model.According to the model, the ability to write an essay requires knowledge schemata (knowledge of the topic), strategic competence (strategies for content development), rhetorical knowledge (strategies for producing cohesive supporting arguments), grammatical competence, and knowledge of vocabulary and register.This is the knowledge that defines L2 writing ability in Bachman and Palmer's approach and the knowledge that informs their analytic scoring rubric.
But which scoring rubric, holistic or analytic, is more preferred by practitioners?There are a number of studies that have compared the behavior of holistic and analytic rubrics with interesting findings.Chi (2001) compares holistic and analytic scoring rubrics, using many-faceted Rasch measurement, in terms of the appropriateness of the scoring rubrics, the agreement of the student scores, and the consistency of rater severity.The study reports significant differences between raters using holistic scoring rubrics, but not analytic scoring rubrics.Other studies confirm this advantage of analytic scoring in terms of inter-rater and intra-rater reliability (Al-Fallay, 2000;Easy & Young, 2007;Knoch, 2009;Nakamura, 2004).Analytic scoring also provides an individualized profile of the test taker's written performance (Weigle, 2002) and direct, useful feedback to students and teachers (Brown & Hudson, 2002).For this reason, analytic scoring rubrics are often chosen for placement and diagnostic purposes (Jacobs et al., 1981;Perkins, 1983;Hamp-Lyons, 1991).
By contrast, holistic scoring rubrics offer the advantage of reduced cost in time and money (Wiseman, 2012).Bauer (1981) compared the cost-effectiveness of analytic and holistic scoring rubrics in scoring secondary school students' essays.The study reports that the time needed to train the raters to use the analytic rubrics was two times the time needed to train raters to use the holistic rubrics, and the time needed to grade the essays using the analytic rubrics was four times the time needed to grade the essays using the holistic rubrics.Other studies in different other contexts have reported similar findings (Klein et al., 1998;Arter, 1993;Bainer & Porter, 1992).For this reason, holistic scoring is the preferred method of scoring in large-scale testing contexts that involve a large concentration of test takers taking the test at the same time (Becker, 2011).
The choice of one type or rubric or the other, therefore, depends mainly on the purpose of using the rubric and is driven by context-specific considerations.The present study is an extension of this tradition of examining the performance of holistic and analytic scoring rubrics.The study used different psychometric statistics (Inter-rater Agreement, Intra-Class Correlation, t-test and ANOVA) to compare the holistic and analytic scoring rubrics as reliable instruments for evaluating EFL writing for achievement purposes.The authors of this study tried to find answers to the following question: 1) Is there a significant difference between holistic and analytical rubrics in enhancing the reliability of scoring?
2) Is there a correlation between each rater's assessment of the same essay using holistic and analytic rubrics?
3) Is there a correlation between different raters' assessment of the same essay using holistic rubrics?4) Does the use of rubrics enhance the consistency of scoring?

Participants
The participants of the study consisted of 30 male and female Yemeni undergraduate students of English at the Faculty of Arts, Taiz University.They were aged between 21 and 25, and were all non-native speakers of English attending the three-credit, 14-week senior-level course Advanced Writing Skills.The course is offered in the seventh semester of the eight-semester Bachelor Program in English Language and Literature.The researcher chose the participants on the basis of their overall GPA in the first three years of college.The participants in this study were the top 30 students in the six semesters leading to the year 2014-2015.They were the senior students of the English department.They took a class on advanced writing skills and have reached a level of competence that should enable them to write an essay.Therefore, the authors wanted to get the most competent students in terms of merit.

Raters
The raters of the writing task consisted of three experienced teaching staff of the same department.They were selected based on their similarity in terms of qualifications, years of experience in teaching, and years of experience in scoring high-stakes tests.The three raters all had a doctoral degree in English with at least five years of teaching experience at the same department.They also had taught different writing courses at the department, and had marked at least three rounds of the annual large-scale English admission test administered by the department.
The raters were invited to a two-hour training session conducted by the researchers.The training, which eventually aimed at improving rating accuracy and rater agreement, involved an explanation of the rating system, a discussion of common rating problems, and advice on avoiding bias.

Scripts and the Writing Task
The scripts consisted of essays written by the 30 participants in response to an independent, timed writing task.The task prompt to the essay was as follows: Reflecting on your own first day in college, write a descriptive essay of about 250 words in response to the following question, What was your first day in college like?How did you feel as a new comer?And what did you do?

Rating Rubrics
The study employed two rubrics-a holistic rubric and an analytic rubric.The holistic rubric is a six-point scale that offers a general description at each point for typical writing performance at that point (see Appendix A).It emulates the rubric used by teachers of the department for assessing students' performance on written tasks.In fact, it has been constructed by the researchers after informal interviews with the teachers about the criteria they use for evaluating written work.The suggested rubric, therefore, comprises two performance criteria-understanding of the topic and correctness of language.
The analytic rubric, on the other hand, is an adapted version of Bachman and Palmer (1996).The researchers contributed a fifth sub-domain to Bachman and Palmer's criterion-referenced rating scale for the assessment of writing ability.This addition was driven by context-specific considerations.The end product is a five-point scale with five sub-domains of writing ability, viz., content, cohesion, syntactic structures, vocabulary, and mechanics of writing.Within each domain, there are several well-defined standards of performance points that each rater clearly understands (see Appendix B).

Rating Procedures
Each rater worked independently and in two separate sessions.In the first session, the raters were given the 30 (anonymous) writing samples and a copy of the holistic rubric.The raters were instructed to assign a single 'holistic' score to each essay from 0 to 5. The scores were then converted into 20 and the total score written next to the number assigned to each participant.The scored writing samples and the rubrics were returned to the researchers in three days' time.The second session took place a month later to allow a gap long enough to ensure a more independent judgment.In this session, the raters were given the same 30 (also anonymous) writing samples and copy of the analytic rubric.They were instructed to assign a score from 0 (zero knowledge) to 4 (complete knowledge) for each sub-domains of writing proficiency and then add the scores and convert them into a total of 20.The scored writing samples and the rating rubrics were returned to the researchers in a week's time.

Results
A number of statistical procedures were employed to answer the study research questions.First, the descriptive statistics of the students' scores using the holistic and analytic rubrics were calculated.This was followed by the descriptive statistics of each rater's assessment of the writing sample using both rubrics.A t-test was used to examine if there was a significant difference between the means of the two scoring rubrics, and Analysis of Variance was conducted to examine if there were any significant differences between the three raters' scoring decisions for each of the two scoring rubrics.In addition, to investigate the agreement among the three raters and measure the inter-rater reliability, an Intra-Class Correlation Coefficient test was implemented.The findings of this study are discussed below.
The results showed that the mean score of the scores using holistic rubric was 14.67 with a standard deviation of 3.12.Using the analytical rubric to assess students' performance yielded a mean of 13.72 and a standard deviation of 2.82.Descriptive statistics for each of the three raters within each of the two rubrics are presented below.A t-test was performed to examine if there was a significant difference between the means of the two scoring rubrics, holistic and analytic.The results showed that the difference was significant between the two rubrics, t (178) = 2.132, p < .05.Using the analytical rubric proved to be more rigorous (M = 13.72,SD = 2.821) than using the holistic approach of scoring (M = 14.67,SD = 3.116).
Assessment should be independent of who does the scoring and the results are supposed to be similar.The more consistent the scores are over different raters, the more reliable the assessment is.Analysis of Variance was used to investigate if there were any significant differences between the three raters for each of the rubric method.The findings showed that there were no significant difference, F (2, 87) = 0.373, p = 0.690, among the three raters when they used analytical rubric to grade students' performance.However, the raters scorings did significantly differ, F (2, 87) = 4.833, p < .05,when they used holistic rubric.Post Hoc analysis was run to find where the differences lie.The results showed that the difference was between rater 2 and rater 3 at P < 0.05.
It is worth investigating to check the correlation between the two scoring methods.If the correlation is high, that means that the two scoring methods may produce similar results.The results here indicated that there was a highly significant correlation, r = 0.80.Nevertheless, a correlation in this context should be more than 0.90.
Studies in literature indicated that rubrics seem to aid raters in achieving high internal consistency when scoring performance tasks.Intra-Class Correlation Coefficient was used to measure intra-rater reliability, the average measures equals the reliability across the raters.For Holistic Rubric, the average measure of ICC for the holistic rubric was .797with a 95% confidence interval from .567 to .904(F (29, 58) = 6.627, p <.001).Whereas, for Analytical Rubric, the average measure of ICC for the analytical rubric was .958with a 95% confidence interval from .921 to .979(F (29, 58) = 25.364,p <.001).Overall, a high degree of reliability was found for the internal consistency.The average measure of ICC was .879with a 95% confidence interval from .788 to .930(F (59, 118) = 10.104,p <.001).
Cohen's kappa was also used to estimate the degree to which there is an agreement among the raters.The results for each pair of the three raters and the overall across the two scoring rubrics are presented in Table 3.

Discussion
Assessment of students performance has to be as accurate as possible because it may have consequences for students being assessed (Black, 1998).There are some sources of variability in any assessment, one of which is raters' judgments of students' performance (Black, 1998).This was the focus of this study.
The difference between holistic and analytical rubrics in enhancing the reliability of scoring was investigated and the results of this study showed that there was a significant difference between the means of the two scoring rubrics, holistic and analytic approaches.It was found that when raters use analytical scoring rubric, they give lower scores than when using holistic scoring rubric.Such findings make sense because analytical rubrics have many details and scoring them is more rigorous.Studies in literature indicated that for this reason analytical scoring rubrics are often used for diagnostic purposes (Jacobs, Zingraf, Wormuth, Hartfiel, & Hughey, 1981;Perkins, 1983;Hamp-Lyons, 1991).
The correlation between the two scoring methods was also computed.The results showed that there was a highly significant correlation.However, this does not mean that there is an agreement among the raters; another analysis was conducted below.The correlation between two scoring methods was 0.80 which is deemed acceptable (Stemler, 2004).
The students' scores are supposed to be similar regardless of who does the scoring.The more consistent the scores are over different raters, the more reliable the assessment is.The Analysis of Variance (ANOVA) showed that there were no significant differences among the three raters when they used analytical rubric to grade students' performance.However, the raters' scorings did significantly differ when they used holistic rubric.These findings are consistent with Chi (2001) findings about the significant differences between raters using holistic and analytical scoring rubrics.The more consistent the scores are over different raters, the more reliable the assessment is thought to be (Moskal & Leydens, 2000).The findings in this study suggest that using analytical rubric produce more consistent and reliable results.
Variations in raters' judgments can occur either across raters, known as inter-rater reliability, or in the consistency of one single rater, called intra-rater reliability.Intra-class Correlation was performed to measure interrater reliability and the consistency of the raters in measuring the students' performance.The Intra-Class Correlation Coefficient was above 0.80 indicating that results are consistent.The majority of studies investigating intra-rater reliability reported alpha values above 0.70 which, according to Brown, Glasswell, and Harland (2004), is generally considered sufficient.
An interrater agreement refers to the extent to which independent raters provide the same rating of a particular person.Cohen's kappa was used to estimate the degree to which consensus agreement ratings vary from the rate expected by chance.The results of study showed that the correlation between two raters appears to be high, and the correlation between two other raters appeared to be low.Kappa values between 0.40 and 0.75 represent fair agreement beyond chance (Stoddart, Abrams, Gasper, & Canaday, 2000)

Conclusion
Rubrics are used by teachers to evaluate students' performance on specific tasks.A rubric is a scoring scale used to assess students' performance along a task-specific set of well-defined criteria.A number of benefits were discussed for using rubrics as a tool to evaluate students on performance tasks.The use of rubrics can 1) increase the consistency of judgment when assessing performance tasks, 2) provide is a valid judgment of performance assessment that cannot be achieved by not using the rubric, 3) give positive educational consequences, such as promoting learning and/or improve instruction, and 4) provide students with quality feedback (Jonsson & Svingby, 2007;Archbald & Newmann, 1988).
Having explored the differences between the widely-used scoring systems for wiring ability, and having underscored the importance of implementing rubrics for better diagnosis of writing problems and for more reliable scoring, the present study zooms in on two kinds of rubrics, viz., holistic and analytic rubrics, and examines the performance of these two rubrics in assessing writing ability.Specifically, the study compares Yemeni EFL students' scores on a writing task using holistic and analytic scoring rubrics.
This study analyzed different psychometric statistics to compare the holistic and analytic scoring rubrics as reliable instruments for evaluating EFL writing for achievement purposes.The results showed that using rubrics yields more accurate scores than not using them; this was also clearly stated in the literature.However, it was concluded that analytical scoring provides even more consistent scores than using holistic scoring methods.Analytical scoring seems to be very useful in the classroom because the results can help both the teachers and learners identify students'' strengths and weaknesses as well as the learning needs.As educators we need to accept that the use of rubrics add to the quality of the assessment (Perlman, 2003).
In summary, scoring with rubrics seems to be more reliable than scoring without one.Rubrics ought to be encouraged as a regulatory device for scoring.The results in this study showed that using holistic rubric can give reliable scores and using analytic rubric gives even more reliable scores.The consistency of scoring can be enhanced by being analytic, topic-specific, and rater training.

Limitation
The main aim of this paper was measuring the consistency of scoring across raters' judgment by means of different correlation coefficients using Many-Facets Rasch Model (MFRM).However, due to the small sample size, MFRM was not used in this study.MFRM is a multivariate extension of Rasch measurement models that can be used to provide a framework for calibrating both raters and writing tasks within the context of writing assessment.Another limitation is that the study was conducted in one institution and used a convenience sample.Therefore, we recommend using a larger and random sample of students from multiple institutions for future research.
1) Provides no poor response to the prompt; demonstrates no understanding of the topic; exhibits no command of essay writing skills; presents argument in barely comprehensible English

Extensive Knowledge
Range: wide range of explicit text organizational devices on essay and paragraph levels Accuracy: highly organized text 3. Extensive Knowledge Range: wide range of explicit cohesive devices including complex subordination Accuracy: highly accurate with only occasional errors in cohesion; composition easily intelligible 3. Extensive Knowledge Range: wide range basic structures with some uses of complex structures Accuracy: good accuracy, few errors but these errors do not affect the meaning that is conveyed accurately 3. Extensive Knowledge Range: wide range of general and specific vocabulary Accuracy: vocabulary items adequately cover the assigned task and are seldom used imprecisely 3. Extensive Knowledge Range: wide range of proper spelling, punctuation, capitalization and paragraphing techniques Accuracy: good accuracy, few errors but these errors do not affect the meaning that is conveyed accurately

Table 1 .
Descriptive statistics for each of the scoring rubrics (N = 90)

Table 2 .
Descriptive statistics for each of the three raters within each of the two rubrics (N = 30)

Table 3 .
Descriptive statistics for each pair of the three raters and the overall across the two scoring rubrics (N = 30) range including a few basic structures Accuracy: poor to moderate accuracy within range; if structures outside of the controlled range are attempted, accuracy may be poor 1. Limited Knowledge Range: small range lacking the formal and appropriate vocabulary required to produce good piece of writing.Accuracy: vocabulary items frequently used imprecisely (limited success in conveying meaning) 1. Limited Knowledge Range: little evidence of deliberate use of correct spelling, punctuation, capitalization and paragraphing techniques Accuracy: poor or moderate accuracy 2. Moderate Knowledge Range: moderate range of explicit text organizational devices Accuracy: organization generally clear but could often be more explicitly marked 2. Moderate Knowledge Range: moderate range of explicit textual devices Accuracy: relationships between sentences generally clear but could often be more explicitly marked and the composition could be more fluid and intelligible 2. Moderate Knowledge Range: medium range-uses basic structures and avoids complex structures Accuracy: moderate to good accuracy within range; if structures outside of the controlled range are attempted, accuracy may be poor 2. Moderate Knowledge Range: moderate range-sufficient to produce a fairly comprehensible piece of writing Accuracy: vocabulary items sometimes used imprecisely (some paraphrasing is used) 2. Moderate Knowledge Range: moderate range of proper spelling, punctuation, capitalization and paragraphing techniques Accuracy: moderate to good accuracy but could be more explicitly marked