Assessment for Learning Instrumentation in Higher Education

This study explains assessment for learning instrumentation, especially in higher education. The population of this study was 100 lecturers of the Muhammadiyah Makassar University, Indonesia. A total of 50 items from six construct were analyzed and used to determine the reliability and validity of the questionnaire. The result shows that the person reliability of the instrument of 100 people was 0.91. It showed that the person reliability is excellent, as well item reliability showed a valued 0.96, which can be categorized excellent. It suggests that the goals of assessment should be to encourage that universities are administered in a way that provides the most appropriate practice in developing teaching and learning process.


Introduction
In an effort to run the article 31 of Law 1945, the Indonesian government from time to time continue to make the development of education through the development of a national education system.The national education system is the overall educational components are interlinked in an integrated manner to achieve national education goals.In the development of national education system can never be separated from the color of the social, political, economic and culture that surrounds them.From the perspective of the national education system, we recognize the national education system version of the old order, the new order, and the order of reform.
The various problems encountered in improving the quality of education in Indonesia, began from primary education to higher education.This is in accordance with what is expressed by Ramly (2005) mentions some of the critical issues of education in Indonesia, among others: the strike of teachers, Higher Education Accreditation System is commercial, the evaluation system is not accommodating, the influx of foreign investment in education, providing education for local authority that the irregularities, the ability of teachers weak in mastering teaching materials, educational institutions and become a contributor of educated unemployment, sectoral egoism meterialismedan scientists, education becomes cheap business arena, and the occurrence of educational teaching materials not only control the behavior and moral development and the absence of taxes for education.
Teaching and learning process does not only talk about the process, but it also talks about the results.Hence, to know the outcome of that process, teachers or lecturers should use the test as a tool in measuring the students' ability or performance, and decided, whether the students can pass or not.In the process of teaching and learning, lecturers not only focus on the teaching process, but also on how they measure their students or apprentices outcomes.Reynolds et al. (2010) stated that the assessment is a systematic process to gather information that can be used to draw conclusions about objects or processes.Ghafar (2011) explained that the assessment is a systematic procedure that involves the collection, analysis and translation of evidence that the student has reached as far as teaching purposes occurs.A number of authors have reported a negative impact of assessment on learning and teaching (Frederiksen, 1984;Ridgway & Schoenfeld, 1994;Dochy & McDowell, 1997).This case demonstrates that assessment has significant impact on teaching and learning.Ghafar (1999Ghafar ( , 2011) ) explains that the reliability refers to the consistency of test results.If a person has a certain skill level, she or he is able to demonstrate the same level when retested, the skill level is reliable.Reliability can be determined by the test-retest, split half, equivalent for parallel, Kuder Richardson, inter-examiner, and inter-observer methods (Ghafar, 1999(Ghafar, , 2011;;Creswell, 2012;Fraenkel & Wallen, 2009).Reliability is an important issue in the use of any instrument if the instrument had been used in other research or if the instrument is built for the purpose of research.Validity is most important when preparing or selecting an instrument.Researchers intend to obtain information using an instrument.Validity include types of measures and procedures of measurement, including formal tests, observation techniques, interview protocols, questionnaires, self-report affective measures, projective devices, and so on (Ghafar, 1999(Ghafar, , 2011;;Goodwin, 2002).The term validity includes two aspects, what is to be measured and how consistently it is measured (Ebel & Frisbie, 1991).
Historically, the term "assessment for learning" begins with the term formative assessment that includes an assessment for learning has been observed by Black and Wiliam (2006) and Newton (2007) from writing Scriven (1967) first distinguishes the difference between formative and summative assessment purposes, the work of Bloom, Hasting and Madaus (1971) and the work of Sadler (1989), which highlights the importance of formative set criteria to inform students about learning.
The focus of assessment for learning is increasing students' achievement (Reeves, 2001) and the students learn rather than teaching (Harris, 2007).Assessment for learning also includes the feedback designed to provide immediate, relevant and useful information to students and the formative feedback aims to provide information communicated to the students to support the modification of thought or behavior to improving learning (Shute, 2008).
Assessment for learning relate to practices, such as sharing criteria with students, developing a classroom talk and asking questions, providing appropriate feedback, and allowing peer and self-assessment (Black and Wiliam 1998a) all requiring the active involvement of students.Learning is seen as a process rather than a product (Sadler, 2007).Teachers need to provide opportunities for students to learn to understand and to engage in thoughtful discussion.Students are not passive recipients of knowledge.They have become their own learning controller for self-assessment and peer assessment.Carless (2005) showed the two cases for the implementation of assessment for learning in Hong Kong.One of the cases that show how an English teacher in primary schools share the assessment criteria with the students and the students grab a part in assessing their peers using a checklist.
Additional cases reported how an English teacher incorporated evaluating peer in the classroom in order to increase student grammar.According to this study (James & Pedder, 2006;Keppell & Carless, 2006;Marshall & Drummond, 2006;Munns & Woodward, 2006) uses the implementation of assessment for learning as a pedagogical training is far additional complex.Bernstein (cited in Munns & Woodward, 2006) provides a lens that displays the subject of interpretation influential educator and student beliefs, personality and manipulation to help understand the complexity.Moreover, in the context of society and a very important strategy in assessment for learning and never linear and closed as a series of relate above.This is while training can inform theory assessment for learning.Reality that teachers and students debate can help researchers explain and understand the dynamics of the relationship assessment.Furthermore, Stiggins (2004Stiggins ( , 2006) ) stated that assessment for learning argues that students learn best when they know what is expected and required for success, and they understand how to close the gap between their own work and the standard for success.The strategy in providing students with this knowledge about what is expected can be found in the use of scoring guide.Accessible instructional scoring guides or rubrics can provide students with important information that can lead students to become self-regulated learner (Saddler and Andrade, 2004).

Research Objective
A research was carried out with the objective to investigate the assessment for learning instrumentation in higher education.

Methodology
The research design utilized was the descriptive survey design, involving only a one-time response to the questionnaire.Fraenkel and Wallen (2009) explained that survey research is intended to obtain data to determine specific characteristics of a group.The Rasch model analysis is used as a tool to know the reliability of the instruments.The items used are the Likert scale type totaling 50 items.The questions were formulated based on six constructs for Assessment for Learning.

Design of Instrument
The constructs and construct indicators or items of the questionnaire were divided into six constructs which are Sharing Learning Objectives (SLO), Helping Pupils (HP), Peer and Self-Assessment (PSA), Providing Feedback (PF), Promoting Confidence (PC), and Involving in Reviewing and Reflecting (IRR).

Population
The populations of this research were all of the lecturers of University of Muhammadiyah, which has education faculty in Indonesia, and the sampling technique used purposive sampling, therefore the number of samples was 100 lecturers at the faculty of education of University Muhammadiyah of Makassar, South Sulawesi, Indonesia.

Validation of Instrument
The instrument validation involved four steps: (i) metadata analysis, (ii) expert validation, (iii) pilot test, and (iv) data analysis using the Rasch Measurement Model with Winstep software.After completing the metadata analysis, the instrument was validated for constructing and content validity of expert in Measurement and Evaluation, Faculty of Education UTM and for face validity for by expert in Language education of the Makassar Muhammadiyah University for face validity.After correcting the instrument as suggested, the pilot study was conducted.Finally, the data were analyzed measure the validity and reliability using the Rasch Measurement Model.

Data Analysis
A total of 50 items from six construct were analyzed and used to determine the reliability and validity of the questionnaire.Statements were coded as numerical responses with Likert Scale rather than as words or phrases.All data were verified by hand checking, coded numerically, and entered onto the SPSS version 20.The analysis using RASCH Model with Winstep software for validation process was then carried out.

Findings
The first step is to analyze the questionnaire whether some items needed to be deleted or modified.The reliability and validity of the questionnaire were measured using person reliability, item reliability, item dimensionality, and difficulty level of scales.In the person misfit table, the columns that needed to be observe were Pt-Measure Corr., outfit MNSQ and Z-STD, and infit MNSQ) and ZSTD (Azrilah, 1996).If the outfit MNSQ and Z-Std value is large, but the infit MNSQ and ZSTD value is within the range, the misfit is still acceptable because of the sloppy response (Azrilah, 1996).

Person Reliability
The person reliability of the instrument of 100 people was 0.91.It showed that the person reliability is excellent (Fisher, 2007).After deleting 26 responds, the Rasch analysis has conducted for the other 74 responds.Person reliability, increased from 0.91 to 0.94.It indicated that the reliability of the instrument was still within the excellent category (Fisher, 2007), as shown in the Table 1.

Item Validity
Table 4 indicates the scale of 40 persons.There are five (5) scales.They were Strongly Agree (SA), Agree (A), Uncertain (U), Disagree (D), and Strongly Disagree (SD).In the Rasch measurement model, the differences between each ranking are taken into account.The difference must be in the range of 1.5<s<5.0(Azrilah, 1996).In Rasch Measurement Model, the probability of responses, whether the scales are equally distributed can be measured or using the scale calibration.Calibration scale is designed to identify the level of difficulty of the questionnaire on the grading scale.It is mandatory to have respondents' information in terms of their ability in distinguishing the scale rating.It was found that the scale differences scales were more than 1.5 and less than 5 except in scale 2 (Disagree) and 5 (Strongly Agree).This indicated that the respondents found difficulty to distinguish the scale 2 (Disagree) and scale 5 (Strongly Agree).

Conclusion
This study showed that the person reliability was categorized as fair, but the item reliability was as Excellent, and the respondents found difficulty to distinguish the scale 2 (Disagree) and scale 5 (Strongly Agree).This study shows the importance of considering symmetry measures due to the gap between person reliability, item reliability, and difficulty level of scales.

Table 1 .
Person reliability for 100 respondents

Table 3 .
Item reliability for 74 respondents

Table 4 .
Scale calibration of 74 persons SUMMARY OF CATEGORY STRUCTURE.Model="R" OBSERVED AVERAGE is mean of measures in the category.It is not a parameter estimate.