Identifying New Jersey Teachers ’ Assessment Literacy as Precondition for Implementing Student Growth Objectives

The Student Growth Objectives are assessments created locally or by commercial educational organizations. The students’ scores from the Student Growth Objectives are included in teacher summative evaluation as one of the measures of teacher’s effectiveness. The high amplitude of the requirements in teacher evaluation raised a concern of whether New Jersey public school teachers were competent in assessment theory to effectively utilize the state mandated tests. The purpose of this quantitative study was to identify New Jersey teachers’ competence in student educational assessments. The researcher measured teachers’ assessment literacy level between different groups based on subject taught, years of experience, school assignment and educational degree attained. The data collection occurred via e-mail. Seven hundred ninety eight teachers received an Assessment Literacy Inventory survey developed by Mertler and Campbell. Eighty-two teachers fully completed the survey (N=82). The inferential analysis included an independent-sample t test, One-Way Analyses of Variances test, a post hoc, Tukey test and Welch and Brown-Forsythe tests. The results of this study indicated teachers’ overall scores of 51% on entire instrument. The highest overall score of 61% was for Standard 1, Choosing Appropriate Assessment Methods. The lowest overall score of 39% was for Standard 2, Developing Appropriate Assessment Methods. The conclusion of this study was that New Jersey teachers demonstrated a low level of competence in student educational assessments. In general, the teacher assessment literacy did not improve during the last two decades.


Introduction
In attempt to reinforce educators' accountability for students learning the federal and state educational agancies mandate school districts to include the student achievement data in teacher evalautation.The number of school districts that in some way incorporate the students' outcomes from the assessmetnts in teacher evaluation is growing across the states.In 2011, New Jersey Department of Education (NJDOE) adopted ACHIEVENJ, educators' evaluation and support system under the Teacher Effectiveness and Accountability for the Children of New Jersey (TEACHNJ) policy.The ACHIEVENJ model consists of three components: Student Growth Percentile (SGP), Student Growth Objectives (SGO) and a few classroom observations by the school administrator using one of the rubrics approved by the NJDOE (New Jersey Department of Education [NJDOE], 2015).The SGPs are the student scores from the standardized tests and are available only for teachers teaching tested grades or subjects such as mathematics and English.The SGOs are student assessments designed locally, by educators, or by commercial organizations and implemented by teachers (Riordan, Lacireno-Paquet, Shakman, Bocala, & Chang, 2015).The students' outcomes from the SGP and SGO tests are included in teacher summative evaluation.The inclusion of these components provides three measures of teacher effectiveness.
The high amplitude of requirements in teacher evaluation raised a concern of whether the New Jersey public school teachers were competent in assessment theory to effectively utilize the state mandated tests, especially the SGO assessments.The rational for this inquiry was that designing any type of assessment is a complex process.An educator who is undertaking this task needs to know the fundamental principles of assessment theory.The poorly developed and incorrectly implemented assessments produce unreliable results and inaccurate inferences about teaching and learning.To address this concern the researcher posted a question: What is the level of competency in student educational assessments of New Jersey public school teachers?To advance the prior studies the comparison of teachers' competence level occurred among different groups of teachers.The following research questions facilitated the development of this study:  What is the statistical comparison of assessment literacy level between teachers from high and low achieving public schools in the state of New Jersey? What is the statistical comparison of assessment literacy level between elementary, middle and high public school teachers in the state of New Jersey? What is the statistical comparison of assessment literacy level between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years? What is the statistical comparison of assessment literacy level between tested and nontested teachers? Does a statistically significant difference exists in the level of assessment literacy between groups of teachers based on level of education attained?The study followed quantitative nonexperimental methodology.The data collection occurred via e-mail by utilizing Assessment Literacy Inventory (ALI) developed by Mertler and Campbell (2005).The inferential analysis included an independent-sample t-test and One Way Analysis of Variance (ANOVA) test at α=0.05 level.Eighty two (N=82) teachers from demographically different schools participated in this study forming a purposive sample size.The applications of this study may lead to informed teacher professional development course, improved administrative decisions pertaining to SGO tests and district wide system of student assessments.On a big scale this study may influence the development of teacher evaluation policy in the state of New Jersey.The most important change may occur in teachers' assessment practices.

Review of the Literature
Preparing students for life, career, and college became a mission for public schools in the United States.The Smarter Balanced Assessments (SBA) consortium and the Partnership for Assessment of Readiness for College and Careers (PARCC) consortium assembled a common set of K-12 assessments in core academic subjects to create a pathway to college and careers readiness for all students (Herman & Linn, 2013;Rentner & Kober, 2014).The school participation in PARCC or SBA testing compelled classroom teachers to raise students' scores on standardized tests in accordance with the federal and state policies.The assessments became catalysts for improving students' achievements and bonds between curriculum and instructions.Simultaneously, teacher assessment literacy became a provision for effective instructions.As a result, the federal and state educational agencies pressured school districts to adopt and implement evaluation systems that measure teacher effectiveness based on students' performance on standardized tests (Baker, Oluwole, Green, & Preston, 2013).

The Students Tests Scores in Teacher Evaluation
The last school improvement initiative, Race to the Top (RTTT), encouraged school districts across the country to include students' achievements on standardized tests in teacher evaluation for the exchange of RTTT's federal monies (Mathis, 2011;Onosko, 2011).The USDOE allocated 350 million dollars to support the states in which teachers' evaluation was based on 50% of student achievements' scores (Onosko, 2011).To execute the federal requirements statisticians developed the Value Added Models (VAM) to measure teachers' effectiveness.In VAMs, the students' previous tests scores used to predict the future scores on assumption that students perform approximately the same each year.The difference between predicted and current scores considered as a teacher or school contribution into students' learning (Gitomer, 2011;Groen, 2012;Marder, 2012).
The other statistical approach to estimate teacher effectiveness is SGP which is similar to VAMs.The SGP measures the student academic growth from one year to the next compared to students with a similar performance history from across the state academic peers (NJDOE, 2015).The utilization of the SGP scores occurs in the following order: The statisticians assign a percentile rank to each student; teachers receive the percentile ranks of all students they taught this year; each teacher receives a score for the year as a median value of the percentile ranks of her or his students (Gill, English, Furgeson, & McCullough, 2014).The SGP scores are available only for teachers who teach mathematics and English from grade three to eight.During these years, students are taking the state standardized tests in mathematics and English.
The inclusion of students' scores in teacher evaluation brought a big resonance into research and practice.Two polarized understanding regarding teacher evaluation exist in educational community.Some stakeholders believe that the students' achievements justify how well teachers perform (Hanushek & Haycock, 2010).The others think that teachers should not be evaluated based on students' tests scores because they do not have control of many factors affecting students' learning (Gitomer, 2011).
The allies of VAM argued that focusing on students' achievement gains helps to eliminate ineffective teachers from low-performing schools (Hanushek & Haycock, 2010).According to Darling-Hammond (2014) the elimination of 5% to 10% of ineffective teachers every year will increase student academic achievements, and the United States will catch up with the high performing nations.The opponents of the VAMs argued that firing teachers will not solve the problems in teacher evaluation and it will not raise the students' scores on standardized tests.Hendrickson (2012) speculated that, in Finland, which has one of the strongest education system in the world, educators pay a little attention to evaluating teachers.Instead, the Finnish educators devote more resources to developing collaborative relationship and collective learning among colleagues to promote student learning (Hendrickson, 2012).
The substantial research on VAMs and SGP raised questions about the validity of the statistical formulas used to estimate teacher's effectiveness.One of the questions was that whether the VAM like estimates accurately measure contributions of special education or English Language Learners teachers (Baker et al., 2013).Gitomer (2011) argued that the VAM does not support the theory of teaching and learning because the changes in scores do not explain what teacher did or did not do to improve students learning.Gitomer argued that the good example of controversy is in fact that 10 points gain in the lower scale is equivalent to 10 points gain on the higher scale.The student with the baseline score of 95% may not show increase and teacher of such student was estimated to be ineffective based on VAM formulas (Gitomer, 2011).The variables other than teacher alone influenced the VAMs estimate (Baker et al., 2013).According to Baker et al. the VAMs produced a different rating for teachers compared to other evaluation instruments.The student mobility and missing data compromised VAM's outcomes (Baker et al., 2013).These facts disrupted the link between the teacher and student.The student prior knowledge, enriched summer classes, private tutor, family background and socio-econimic status brought unfairness to the VAM's outcomes (Baker et al., 2013).Finally, the nonrandom student assignment to teacher, the classroom composition and school functioning added biases into statistical formulas (Marder, 2012;Onosko, 2011).
The SGO assessment is an alternative way to measure teachers' contribution to students learning, especially for teachers teaching nontested grades or subjects (Riordan et al., 2015).The SGOs are "academic goals for different groups of students that are aligned to the state standards and can be tracked using objective measures" (NJDOE, 2015, p. 3).The SGO development varies from district to district and state to state.The SGO tests may be developed by utilizing local resources or by using commercial educational organizations.The state or school districts decide what type of the SGO to use for teacher evaluation.The SGO developed by teacher was the most common type.According to Lacireno-Paquet, Morgan and Mello's (2014) study twenty-three states mandated teachers to individually develop the SGO tests.In general, the SGO implementation begins with the developing or selecting appropriate assessments followed by preassessment or diagnostic tests (Gill et al., 2014).Based on the preassessments results teacher sets the learning targets for the entire class, or group of students or individually for each student (Gill et al., 2014).Than teacher chooses measures to evaluate each student or group of students' proficiency level.At the end of the school year, teachers administer the post SGO test to measure students' growth (Gill et al., 2014).

The SGO Assessments in Teacher Evaluation
The architects of the SGO asserted that this test perfectly supplies data of students' growth due to teacher factor.The first onset of the SGO in the state of New Jersey began in 2011, yet it is not fully understood the validity and reliability of this test or the way schools utilize this assessment.Gill et al. (2014) believed that the locally developed SGOs compromise the comparability and reliability of the test.In effort to validate the SGO test, Hu (2015) conducted a quantitative study in Charlotte-Mecklenburg, North Carolina.One hundred and fifty nine schools participated in Hu's study yielding to 18,800 teachers.
To note that, the teacher evaluation in North Carolina is similar to ACHIEVENJ.Both evaluations include the SGO test and the classroom observation by the school principal.The difference between two evaluations is that the ACHIEVENJ includes the SGP scores while the North Carolina evaluation includes the VAM scores.Hu's (2015) goal was to find correlation between the quality SGO and VAM's scores.Hu hypothesized that the VAM and SGO scores should be similar in estimating teachers' effectiveness.Hu found 67% positive correlation and 33% negative correlation between the VAM and quality SGO across years and grades in mathematics, and 73% positive correlation and 27% negative correlation across the grades and years in reading.The student race and ethnicity were significant predictors in the models for both mathematics and reading across the grades and years (Hu, 2015).The class size as a factor varied across the grades (Hu, 2015).Hu (2015) stated that after controlling for the extraneous variables such as class size, prior student achievements and background characteristics the higher quality SGO corresponded to higher teachers' VAM scores in mathematics and reading.No statistically significant findings of the relationship between the VAM and SGO attainment status existed.Hu explained this fact that some effective teachers did not have skills in designing quality SGO, or some ineffective teachers were skilled in developing quality SGO.Hu suggested that the school districts should not use VAM or SGO to make high stake decisions about teacher practices because there were many factors influencing instructions and learning.Pollins (2014) explored SGO as a process and its implication on teachers and school administrators.Pollins found positive impact of the SGO on elementary and middle school teachers' practices in Rhode Island public schools.According to Pollins, teachers and administrators agreed that the SGO increased collaboration among colleagues and opened dialogue about students' learning and common assessments.Additionally, Polins reported that teachers encountered obstacles during the SGO process.The Rhode Island teachers stated that they rarely used quality assessments for the SGO purposes, and teachers needed directions of how to create or choose student assessments for the SGO (Pollins, 2014).Likewise, Pollins suggested not to use the outcomes from the SGO for the decisions related to teacher retention, pay and evaluation.
The theory behind the SGO tests is effective teaching through the quality assessments.The SGOs are assessments with multiple roles: To communicate to students learning goals and expectation, to provide students feedback, to help teachers to monitor students learning, to adjust instructions and to measure students' academic growth (Lacireno-Paquet et al., 2014;Gill et al., 2014;NJDOE, 2015).The instructional planning represents the SGO's main role.The inclusion of students' outcomes from the SGO assessments in teacher evaluation conflicts with the SGO's primarily role.

Teacher Evalaution in the State of New Jersey
In the state of New Jersey, ACHIEVENJ teacher evaluation consists of three measures: The SGP scores, the SGO scores, and the classroom observation by the school principal using one of the teacher evaluation models approved by NJDOE.According to NJDOE (2015), the Danielson Model was the most popular in the state of New Jersey.One hundred and thirty six out of 571 school districts in the state of New Jersey utilized Danielson Model.The second most popular evaluation was Stronge Teacher and Leader Effectiveness Performance System, 65 school districts in the state of New Jersey utilized this model (NJDOE, 2015).The third most popular was Marzano Causal Teacher Evaluation Model, 53 school districts in the state of New Jersey used Marzano Model (NJDOE, 2015).The Danielson Model has a 4-tiered rubric: Highly effective, effective, partially effective and ineffective.The scores of a highly effective teachers are in the range from 3.5 to 4; the scores of effective teachers are in the range from 2.65 to 3.49; the scores of a partially effective teachers are in the range from 1.85 to 2.64 and for ineffective teachers the score are in the range from 1.0 to 1.84 (NJDOE, 2015).The SGO score has a 4-tiered rubric with the same range in scores for four levels of performance.Teachers without the SGP score receive a summative score which combines 20% of the SGO and 80% of the classroom observations.Teachers with the SGP scores receive a summative score of 10% of the SGP median, 20% of the SGO and 70% of classroom observations.Callahan and Sadeghi (2015) conducted a statewide study to investigate three phenomena: New Jersey teachers' perceptions of ACHIEVENJ, the level of communication between teachers and administrators, and the availability, frequency and effectiveness of the professional development opportunities.Callahan and Sadeghi reported that teachers perceived ACHIEVENJ model as unfair that does not accurately evaluates their teaching abilities.According to Callahan and Sadeghi teachers reported that the number of classroom observations increased and resulted in increased professional dialog about instructions, students' assessments, and students learning.Furthermore, teachers stated that the quality of the observations decreased because administrators spent more time entering the evidences and information into laptops than actually observing teachers (Callahan & Sadeghi, 2015).In 2014, 56% of teachers wanted more professional development related to their areas of need, and only 5% of teachers reported that they were satisfied with the training they received (Callahan & Sadeghi, 2015).According to New Jersey teachers the implementation of the ACHIEVENJ did not address poor practice, excellent teachers were not recognized, novice teacher did not receive support, and professional learning was not tailored to teacher's needs (Callahan & Sadeghi, 2015).In support of the SGO method the NJDOE advocates that a thoughtfully developed and collaboratively implemented SGOs improve the quality of discussion about student growth, learning and instructions.The SGO method increases teacher engagement in the assessment practices, enriches teacher knowledge of curriculum standards and it fosters teacher leadership (Gill et al., 2014;NJDOE, 2015).
The induction of a new measure in teacher evaluation intensified the role of educational assessments in the process of improving learning outcomes.The teacher assessment literacy became a policy consideration.Popham (2011) underlined two reasons for teachers to become an assessment literate: To understand how the accountability assessments determine educator's professional quality, and to understand how assessments improve students learning and instructions.The modern schools, according to Popham need educators who are competent in theory of assessments, and effectively use assessments to make instructional and administrative decisions.

Teacher Assessment Literacy
Teacher assessment literacy continues to conquer public interest.The stakeholders in education want to know how teachers utilize assessments to evaluate students' learning.Educational specialists defined an assessment literate teacher as an expert who is able to select or develop and administer different types of assessments, use the data from the assessments to inform instructional decisions, and to communicate the assessments results to students and their parents (Mertler & Campbell, 2005;Popham, 2011;Stiggins, 2002).Measuring teacher assessment literacy shortly began after the committee of Teaching Profession the American Federation of In 2013, Davidheiser (2013) measured 180 high school teachers' assessment literacy in the state of Pennsylvania reporting that teachers answered less than 25 items out of 35 correctly.Simultaneously, Perry (2013) measured 14 teachers' and 32 principal's assessment literacy in the state of Montana.Perry reported that on average teachers answered 22 items out of 35 correctly while principals answered 21 items out of 35 correctly on the same instrument.Barone (2012) investigated Albany Middle and Albany High schools teachers' assessment literacy before the professional development and after as measured by ALI.Barone concluded that as teachers became more knowledgeable in assessments their correct scores on ALI, on average, increased from 73% to 77%.Barone reported a strong correlation between two administrations of the test as 0.97 (r = 0.97, p < 0.05).Barone attributed the increase in teachers' scores to the effect of the professional development.
The research in the field of education produced a substantial knowledge related to preservice teachers' assessment literacy.Siegel and Wissehr (2011) explored the assessment literacy of 11 preservice teachers enrolled in secondary science methods course of their teacher preparation program.Siegel and Wissehr found that during the methods course preservice teachers identified 19 assessments tools, described their advantages and disadvantages, and evidently demonstrated how to align instructional goals to specific assessments.Siegel and Wissehr concluded that the knowledge of the assessment tools attained by teachers does not constitute its realization in the future classrooms.The working conditions and reality of the classrooms may impede application of the acquired knowledge.Wallace and White (2015) investigated how secondary mathematics preservice teachers' perspectives of the assessment practice evolved during one year of reform based preparation programs.Six preservice teachers in the state of California participated in the study.The findings of Wallace and White's study showed that the perception of the assessment practices of preservice teachers evolved through three different stages.As course of the program continued to unfold the teachers' viewpoints progressed from the test oriented to task oriented and to tool oriented (Wallace & White, 2015).According to Wallace and White during the test oriented stage teachers used the limited number of assessments only for one purpose to evaluate and grade students' work.During this stage teachers utilized criterion referenced tests which included strictly procedural items with no connections between the concepts (Wallace & White, 2015).During the task oriented stage teachers began to recognize that the test could be utilized for different purposes.The test became criterion referenced and student referenced with some connections between the concepts (Wallace & White, 2015).During the tool oriented stage teachers realized that the main purpose of the assessment is to improve teaching and learning.The assessments significantly transformed from the traditional, with only one purpose to measure, to reform-based for learning purposes.The format of the tool oriented tests included items that require students to apply exploration, reasoning and analysis to solve the problems (Wallace & White, 2015).Odo (2015) believed that developing teachers' assessment literacy requires a significant attention during teacher preparation programs.According to Odo, teachers' familiarity with the assessment assortment, available for their use, improves instructions in the diverse classrooms.Teachers' understanding of fundamental characteristics of the assessments such as validity, reliability and bias will help teachers to interpret the standardized tests proliferating in public schools (Odo, 2015).The results of this study suggested that teacher education related to student assessments should continue after teachers enter classrooms and should be given a thoughtful attention during teacher entire career.

Theoretical Foundation
The contemporary teacher evaluation models are based on Professional Teaching Standards developed by the National Board for Professional Teaching Standards (NBPTS) organization in 1987 (Darling-Hammond, 1999).The NBPTS's policy statement, What Teachers Should Know and Be Able to Do, clearly communicates to stakeholders professional standards for teachers.The Professional Teaching Standards provide a common language to educators to discuss teaching and learning.The authors of the NBPTS formulated five core propositions as follows:

•
Teachers are committed to students and their learning.
• Teachers know the subjects they teach and how to teach those subjects to students.
• Teachers are responsible for managing and monitoring student learning.

•
Teachers think systematically about their practice and learn from experience.• Standard 1: Teacher should be skilled in choosing assessment methods appropriate for instructional decisions.

•
Standard 2: Teacher should be skilled in developing assessment methods appropriate for instructional decisions.

•
Standard 3: Teachers should be skilled in administering, scoring, and interpreting the results of both externally-produced and teacher-produced assessment methods.

•
Standard 4: Teacher should be skilled in using assessment results when making decisions about individual students, planning teaching, developing curriculum, and school improvement.

•
Standard 5: Teacher should be skilled in developing valid pupil grading procedures that use pupil assessments.

•
Standard 6: Teachers should be skilled in communicating assessments results to students, parents, other lay audience, and other educators.

•
Standard 7: Teacher should be skilled in recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of assessment information.
The Standards for Teaching Profession and the Standards for Teacher Competence in Educational Assessment of Student served as a theoretical framework for this study.The student educational assessments and teacher assessment literacy was discussed within the scope of the standardized teacher evaluation.The Danielson Model for Teaching and Learning served as a theoretical foundation for this study because almost every domain of teacher practices in Danielson Model includes the assessment element as an effective instructional method.All domains in Danielson Model are rooted in Professional Teaching Standards and reflect the Standards for Teacher Competence in Educational Assessment of Student.According to Danielson (2013), teachers need to know how to design quality assessments that have the ability to provide wide range of evidences of students learning.
The Danielson Model consists of four domains and 72 elements of teacher practices: Domain 1 Planning and Preparation, Domain 2 Classroom Environment, Domain 3 Instruction, and Domain 4 Professional Responsibilities (Danielson, 2013).The model incorporates four tier-rubric to rank teacher performance in each domain: Unsatisfactory, Basic, Proficient, and Distinguished (Danielson, 2013).

Methodology
This study applied non-experimental quantitative methodology and followed causal-comparative design.The research's questions, data collection, sample size, instrumentation and variables determined the method and design.Each research question had the following hypothesis: RQ1: What is the statistical comparison of assessment literacy level between teachers from high and low-achieving public schools in the state of New Jersey?Ho: No significant statistical difference exists in assessment literacy level between teachers from high and low achieving public schools in the state of New Jersey.
Ha: A significant statistical difference exists in assessment literacy level between teachers from high and low achieving public schools in the state of New Jersey.
RQ2: What is the statistical comparison of assessment literacy level between elementary, middle and high public school teachers in the state of New Jersey?Ho: No significant statistical difference exists in assessment literacy level between elementary, middle and high public school teachers in the state of New Jersey.
Ha: A significant statistical difference exists in assessment literacy level between elementary, middle.
RQ3: What is the statistical comparison of assessment literacy level between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years?Ho: No significant statistical difference exists in assessment literacy level between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years.
Ha: A significant statistical difference exists in assessment literacy between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years.
RQ4: What is the statistical comparison of assessment literacy level between tested and nontested teachers?Ho: No significant statistical difference exists in assessment literacy level between tested and nontested teachers.
Ha: A significant statistical difference exists in assessment literacy level between teachers of tested and nontested subjects.
RQ5: Does a statistically significant difference exists in the level of assessment literacy between groups of teachers based on level of education attained?Ho: No significant statistical difference exists in assessment literacy level between groups of teachers based on level of education attained.
Ha: A significant statistical difference exists in assessment literacy level between groups of teachers based on level of education attained.
The inferences about the group differences, stated in research questions, required statistical analysis based on theory of probability.The hypothesis testing approach assisted the analysis of each question (Bock, Velleman, & DeVeaux, 2010).The null hypothesis for each question stated that there is no difference in assessment literacy level between the means among the groups of teachers (μ1 = μ2) while the alternative hypothesis stated that there is a difference in assessment literacy level between the means (μ1 ≠ μ2, two-tailed).The null hypothesis was rejected when the probability of occurrence of the value for the null hypothesis was less than 5% (at α = 0.05) concluding that there was an evidence that the statistically significant differences in groups means exist in population.
The other reason for the quantitative approach was the data collection method.In quantitative studies the data is gathered by structured instruments (Johnson & Christensen, 2014).The data collection for this study occurred by utilizing ALI instrument, developed by Mertler and Campbell (2005), specifically for quantitative analysis.Furthermore, the researcher of this study collected data from a large population of 789 school teachers which is another trait of quantitative approach.The sample size of 82 is considered to be a large enough to compute sample's statistics that accurately reflect the population parameters (Bock et al., 2010).The other trait of the quantitative method was the nature of variables.The dependent variables were quantitative: Teachers' composite scores on Standards.Finally, this study may be replicated by the future researchers which is considered as an important characteristic of a quantitative methodology (Bock et al., 2010;Johnson & Christensen, 2014).

Population and Sample
The general population of this study was 139,699 certified school teachers employed in 694 operating school districts in the state of New Jersey (NJDOE, 2015).The target population combined school teachers from three school districts and one high school.The school districts and high school received pseudonyms as, SD#1, SD#2, SD#3 and HS#4 to maintain confidentiality regarding participation.
Until 2015 in the state of New Jersey the students' outcomes from two standardized tests, High School Proficiency Assessment (HSPA) and New Jersey Assessment of Skills and Knowledge (NJASK) in elementary and middle schools, determined the school district performance.At the time of the study, based on HSPA and NJASK results, the SD#1 and SD#2 were suburban high achieving school districts and SD#3 and HS#4 were low achieving urban schools.The SD#1 employed 201 teachers, the SD#2 employed 335 teachers, the SD#3 employed 175 teachers, and the HS#4 employed 69 teachers.The target population was 798 participants.In total, 169 teachers, 21%, responded to survey.Only 82 (N = 82) fully completed responses defined the purposive sample size.Sullivan and Feinn (2012) suggested calculating the effect size to estimate reasonable sample size before the study is carried out.The Cohen's d value for the independent-sample t tests and ANOVA tests aided the analysis of the effect size.According to Sullivan and Feinn, Cohen classified effect sizes as small (d = 0.2), medium (d = 0.5), and large (d > 0.8).To estimate an effect size, Sullivan and Feinn suggested to pilot the study, or use the estimates from the similar work published by other researchers, or use the minimum difference that are considered important by experts.
The utilization of G*Power 3.1 allowed to calculate the minimum sample size required to produce statistically significant results.Base on anticipated effect size d = 0.45 and confidence level of 95% (α = 0.05) the Power Analysis for the F-tests was 0.9541217 for the total sample size of 81 participants.The utilization of the Statistical Package for the Social Sciences IBM SPSS Statistics Base 21 (SPSS) software facilitated analysis of collected data.The applications of the descriptive statistics aided the description of the study sample.The applications of inferential statistics expedited the answers to the study questions.

Data Collection Method
During the winter and spring of 2016, 165 superintendents of public school districts received invitations to participate in this study.The researcher made phone calls to school superintendent's offices following alphabetical order in which districts were listed on the NJDOE publicly open website.Three superintendents and one high school principal agreed to participate in the study.In May 2016, 798 public school teachers from participating schools received an online survey, ALI (Mertler & Campbell, 2005) via school e-mails.Two weeks later the researcher e-mailed a reminder to teachers to complete the survey.The access to the survey was opened for five weeks.

Instrumentation
The members of the American Educational Research Association (AERA) reviewed ALI instrument in Montreal in 2005 and concluded that the instrument is a reliable tool to measure teacher assessment literacy (Mertler & Campbell, 2005).After the AERA review the educational researchers widely used ALI instrument.Hamilton (2014) conducted a quantitative study to investigate to what extent teacher assessment literacy as measured by ALI associated with the teacher knowledge of Curriculum Based Measurement (CBM).The CBM is another research based instrument measuring educators skills in assessing students' knowledge of curriculum taught in the classroom.The study took place in elementary schools in Rhode Island.Hamilton found positive significant correlation between two instruments (r = 0.505, p < 0.01).Hamilton reported that teachers with the high scores on ALI tended to have high scores on the CBM, and vice versa.Hamilton concluded that the two instruments measured the same construct.Standard 2, the group mean for middle school teachers was 0.49 (M = 0.49, SD = 0.25) and the group mean for high school teachers was 0.31 (M = 0.31, SD = 0.22).For results see Table 8 in Appendix G.
Again, Standard 2, refers to teachers' knowledge and skills to develop quality assessments to improve teaching and learning (Danielson, 2013).An elementary and middle school teachers may be more skilled in developing assessments and using the assessments outcomes for instructional decisions.Perhaps, the pressure of standardized testing in middle and elementary schools encourages teachers to assess students more frequently to better prepare them for the rigorous tests.
Based on inferential analysis, the statistically significant differences related to Standard 4 existed between the middle and high school teachers and between elementary and high school teachers.The significance, p values for ANOVA test at α = 0.05 level was 0.001 (p = 0.001, p < 0.05).The significance, p value for the post hoc, Tukey's HSD at α = 0.05 level was 0.008 (p = 0.001, p < 0.05).The elementary school teachers performed higher compared to high school teachers on Standard 4. The group mean for elementary school teachers was 67% (M = 0.67, SD = 0.23) and the group mean for high school teachers was 45% (M = 0.45, SD = 0.22) for Standard 4. The middle school teachers performed higher on Standard 4 compared to high school teachers.The significance, p value for the Tukey's HSD at α = 0.05 level was 0.008 (p = 0.008, p < 0.05).The group mean for middle school teachers was 63% (M = 0.63, SD = 0.24) and the group mean for high school teachers 45% (M = 0.45, SD = 0.22).No evidences occurred that statistically significant differences exist between the middle and elementary school teachers.For results see Table 8 in Appendix G.
The Standard 4 requires teachers to use the assessment information for instructional planning and to manage student educational development (AFT, NCEM, & NEA, 1990).Danielson ( 2013) defined a distinguished teacher as an expert in using various types of assessment to direct instructional improvements.Importantly to note, that Protheroe ( 2009) argued that students demonstrated an increase in scores on standardized tests in schools with staff trained to use assessment data effectively.
Based on inferential analysis, statistically significant differences related to Standard 6 existed between the middle and high school teachers.The significance, p values for ANOVA test at α = 0.05 level was 0.028 (p = 0.028, p < 0.05).The significance, p value for the post hoc Tukey's HSD test at α = 0.05 was 0.024 (p = 0.024, p < 0.05).The middle school teachers performed higher compared to their peers form high schools.The group mean for middle school teachers was 63% (M = 0.63, SD = 0.26) and the group mean for high school teachers was 45% (M = 0.45, SD = 0.23).For results see Table 8 in Appendix G.
The Standard 6 refers to teacher' ability to communicate assessments results.A distinguished teacher, according to Danielson (2013), needs to know how to interpret the tests' results and communicate the information about students' learning to students, parents and other educators.An assessment literate teacher selects or develops and administers different types of assessments, uses the data from the assessments to inform instructional decisions, and communicates the assessments results to students and their parents (Mertler & Campbell, 2005;Popham, 2011;Stiggins, 2002).

Research Question 3
What is the statistical comparison of assessment literacy level between groups of teachers who taught 0-4 years, 5-10 years, 11-20 years, and more than 21 years?
The years of experience was not a significant factor in teacher performance.On average, teachers with more than 10 years of experience performed slightly higher compared to other two groups.Figure 3    , 169 rveys served as a foundation for statistical analysis.A greater number of participants would lead to more precise statistical analysis.However, based on effect size of d = 0.45 and confidence level of 95% (α = 0.05) the Power Analysis for the F-tests was 0.9541217 for the total sample size of 81 participants.The N = 82 allowed to sufficiently conduct statistical analysis.
Another limitation occurred during data analysis.When analyzing data for question 5 only two groups, instead of four, emerged.Twenty-three teachers with bachelor's degree and 57 teachers with master's degree completed the survey forming two groups.The comparison of teachers' assessment literacy occurred only between the groups of teachers with bachelor and master degrees.
The conceptual framework for this study was that the assessment literate teachers are effective (Danielson, 2013;Popham, 2011;Stiggins, 2002).This concept presumed the next limitation that the minimal research exists describing the relationship between the level of teacher assessment literacy and students' academic achievements.The school district's vision, mission and professional development plan, preservice teacher preparation program and teacher's individual involvement in professional learning influence teacher's knowledge in assessment theory.

Conclusions
On average, teachers from high achieving schools performed better compared to teachers from low achieving schools.The statistically significant differences occurred between the middle, elementary and high school teachers.The middle and elementary school teachers demonstrated better results for Standard 2 (Developing Appropriate Assessment Methods Appropriate for Instructional Decisions), Standard 4 (Using Assessment Results when Making Decisions about Individual Students, Planning Teaching, Developing Curriculum, and School Improvement) and Standard 6 (Communicating Assessments Results to Students, Parents, other Lay Audience, and other Educators).The assessment literacy barely improved as teachers' years of experience increased.The tested and nontested teachers' scores were almost the same, except for Standard 7 (Recognizing Unethical, Illegal, and otherwise Inappropriate Assessment Methods and Uses of Assessment Information).The statistically significant differences occurred between the tested and nontested teachers only for Standard 7. Teachers with the master's degree performed higher on every standard compared to teachers with the bachelor's degree.
New Jersey teachers' average score of 51% in assessment literacy confirmed an alarming statement made by Popham (2011) and Gareis and Grant (2015) that only a few teachers are conversant of how accountability tests work.For instance, only 4% of teachers answered item #28 correctly.This item refers to teachers' skills to implement the standardized tests without breaking standardization.Only 6% of teachers answered item #27 correctly, and 22% of teachers answered item #10 correctly.These items refer to teachers' ability to interpret results from the standardized tests.Eighteen items answered incorrectly by most of the teachers validated Stiggins (2002) statement that the educators were unable to differentiate information obtained from the assessments.
In conclusion, the patterns in teacher assessment literacy, found in this study, indicated that teachers' competence in assessments did not improve during the last two decades.As Gareis and Grant (2015) stated, "the evidence that teachers continue to be ill-prepared in the domain of assessment persists to the present day" (p. 7).The results of this study supported the findings of previous researchers stating that teachers were concerned about the quality of the SGO assessments (Hu, 2015;Pollins, 2014).The data from unreliable SGOs provides misleading inferences about teaching and learning and subsequently do not have impact on teacher practices.The educational policymakers should acknowledge the consequences of utilizing data from the low quality assessments in teacher evaluation.Before holding teachers accountable for the students' achievements on high stake tests the district leaders should re-evaluate the school culture related to assessments.To maximize the intended purposes and potential benefits of the educational assessments and the SGO the school leaders should promote teacher assessment literacy.The SGO test should serve as a tool for instructional improvements and not a measure of teacher effectiveness, and the outcomes from the properly completed SGO process should reflect students' true academic growth.

Recommendation for Future Research
This study was a quick snapshot of New Jersey public school teachers' competence in student assessments.Replicating this study as a statewide research to generalize conclusions to the entire population of school teachers in the state of New Jersey was the initial recommendation.The larger sample size would serve to secure the sufficient number of participants in different demographic groups to conduct more precise inferential analysis (Bock et al., 2010;Johnson & Christensen, 2014).The results of the statewide research can help to improve the policy related to teacher evaluation in the state of New Jersey.
Another recommendation was developing a rubric or a system to evaluate teacher's proficiency in educational assessments.The developers of the existing instruments did not specify a minimum requirement for teachers in assessment literacy.The next step should be investigating a relationship between the teacher proficiency in assessments and students' academic achievements.The question for the future inquiry is: Do teachers with the higher score on ALI have more proficient students as measured by the standardized tests or school district benchmarks?
The third recommendation involves exploring the relationship between the SGO attainment status and a teacher-assessment-literacy level.The question for the future researcher is: Do teachers with the higher score on ALI have more students achieving their goals as measured by SGO? Conducting a qualitative research to explore how teachers perceive their assessment practices in relation to school culture, district vision and mission; and how teachers use the SGO data for instructional purposes between the pre SGO test and post SGO test was the final recommendation for the future research.

Recommendations for Future Practice
The first recommendation to school leaders was to provide teachers with support related to all seven standards but with the priorities given to the standards with the lowest scores.Specifically, remediation of how to develop assessments for different purposes because the lowest score was for Standard 2, Developing Appropriate Assessment Methods Appropriate for Instructional Decisions.The next attention should be given to mastering skills in creating rubrics and grading system to evaluate student knowledge because the second lowest score was on Standard 5.
Based on study results, on average, teachers from high achieving schools outperformed their peers from low achieving schools.The school administration of low achieving schools should look at the assessment practices and teacher professional development plan of high achieving schools.On average, the middle and elementary school teachers demonstrated higher results compared to high school teachers.The school administrators can explore the differences in practice related to assessment methods between the middle, elementary, and high schools.
Further, on average, teachers with 21 and more years of experience performed slightly better compared to other groups.The school administration should establish or reinforce the teacher-peer-mentoring program and assign a tenured, highly effective, teacher to a novice teacher.However, when giving an additional responsibility to a teacher, school leaders should think of the future obstacles, and possibly provide to teacher-mentor some kind of rewards or compensations.
On average, teachers with the master's degree performed better on every standard compared to teachers with the bachelor's degree.The school district leaders should encourage teachers to continue their education and professional learning.One of the district options may be providing teachers some incentives and recognition for completing additional courses related to their subject area.
It is important to note that, the researcher of this study did not have an intent to diminish the practical individual knowledge of teachers.The researcher believes that the classroom teachers have fundamental understanding of how the school culture and district's vision influence the assessment practices in the classrooms.The goal of this research was to measure and compare the assessment literacy among different groups of teachers by using quantitative instrument.

Appendix A Teachers' Composite Scores Based on School Assignemnt
Teachers (AFT), National Council on Measurement in Education (NCME), and National Education Association (NEA) developed the Standards for Teacher Competence in Educational Assessment of Student in 1990 (American Federation of Teachers [AFT], National Council on Measurement in Education [NCME], & National Education Association [NEA], 1990).Plake et al. (2005) used the Standards to develop Teacher Assessment Literacy Questionnaire (TALQ) instrument.Five hundred and fifty-five teachers from 98 school districts in 45 states completed the TALQ survey.Plake et al. reported teachers' score as 66% on overall instrument.According to Plake et al. teachers answered 23 items out of 35 correctly.In 2003, Mertler and Campbell (2005) replicated Plake et al.'s study using TALQ.Two hundred and twenty undergraduate preservice teachers participated in the study.Mertler and Campbell reported that teachers answered 21 items out of 35 correctly yielding to 60% of overall teachers' score.Mertler and Campbell believed that TALQ's questions were difficult and lengthy to read.Mertler and Campbell modified TALQ into Classroom Assessment Literacy Inventory (CALI) and later to Assessment Literacy Inventory (ALI).
ce, p values at he post hoc, Tu e differences b ared to high sc

Figure
Figure 3. NJ te demonstrates New Jersey teachers' composite score on standards based on years of experience.

Table 2 .
Teachers' composite scores based on school assignment

Composite Scores Based on Grade Level TaughtTable 3 .
Teachers composite scores based on grade assignment

Teachers Composite Scores Based on Years of ExperienceTable 4 .
Teachers composite scores based on years of experience

Composite Scores Based on Tested and Nontested SubjectsTable 5 .
Teachers composite scores based on tested and nontested subjects

Composite Score Based on Educatioanl DegreeTable 6 .
Teachers' composite scores based on educational degree