Developing Achievement Test: A Research for Assessment of 5th Grade Biology Subject

The purpose of this study is to prepare a multiple-choice achievement test with high reliability and validity for the “Let’s Solve the Puzzle of Our Body” unit. For this purpose, a multiple choice achievement test consisting of 46 items was applied to 178 fifth grade students in total. As a result of the test and material analysis performed during the test development process, difficulty, distinctiveness, and item-total correlation coefficients of the materials were calculated. For the validity study, a table of specifications was prepared and the Content Validity Index (CVI) was found to be 0.95 by taking an expert opinion. As a result of the analysis, 8 items were removed from the test and the KR-20 reliability coefficient of the final test consisting of 38 items was calculated as 0.87. As a result of the item analyses, while item difficulty indices were valued between 0.30 and 0.74, item distinctiveness indeces were valued between 0.31 and 0.71. The average difficulty of the test was calculated as moderate (0.56) and its distinctiveness was calculated as very good (0.49).


Introduction
In the researches related to the 5 th grade biology subjects covering the transition period from primary school to secondary school, it is revealed that the students have deficient level of knowledge and alternative concepts in "Nutrients and their characteristics", "Digestion of nutrients" and "Excretory System in our body" which are included in "Let's Solve the Puzzle of Our Body" unit (Banet & Nünez, 1997;Carvalho, Silva, & Clément, 2003;Güngör, 2009;Güngör & Özgür, 2009;Nünez & Banet, 1997;Patrick & Tunnicliffe, 2010). The deficient level of knowledge level of the students from their early ages led the researchers to make studies about eliminating the lack of information and alternative concepts of the students in these subjects.
In educational studies, many measurement tools such as interviews, open-ended questions, concept maps, tests can be used to determine the level of students' understanding of knowledge and concepts. Qualitative ones from these researches can work with fewer participants, enabling more in-depth research; however, with quantitative ones a large audience can be reached with more participants (Griffard, 2001). Multiple choice tests are very suitable measurement tools for determining the level of knowledge of different subjects of many students at different academic levels (Burton, Sudweeks, Merrill, & Wood, 1991). Multiple-choice tests also enable students to determine the misconceptions they have by including inaccuracies they have in the options (Treagust, 1988).

Sample
Pilot application of the study was carried out with a total of 178 5 th grade students, 89 female and 89 male students studying at two central schools in Samsun city of Turkey. The schools of whom have been randomly selected by lot, among the schools belonging to Turkish Ministry of National Education without considering their academic success. The distribution of the sample according to the schools and sex is shown in Table 1.

Data Collecting Instrument
In this research, Let's Solve the Puzzle of Our Body Unit Achievement Test was used as a collecting data tool. The aim of using multiple-choice testing as an achievement test is to allow the ability to measure many sub-concepts of the unit taught in the research, to make it easy to evaluate and to enable to measure how much it has been learned (Marx et al., 2004).
The achievement test was prepared by taking into consideration the objectives of the "Let's Solve the Puzzle of Our Body" unit which is included in the 5 th grade Ministry of National Education (MoNE) Science Curriculum to be used in the research. Regarding the level of readiness of 5 th grade students, 46 multiple choice test items with four options were created. While creating the test items, the questions of the achievement test were created by the researcher by examining the 5 th grade Science course books prepared by the Ministry of National Education (Erten, 2015;Karaca, 2014), leaf tests related to "Let's Solve the Puzzle of Our Body" unit and the exams conducted by the Ministry of National Education for the all secondary school students across the country.

General Information about Unit
In the context of Let's Solve the Puzzle of Our Body unit there are three subtopics: Nutrients and their characteristics, digestion of nutrients and excretory system in our body. In the Ministry of National Education Science Curriculum in total 13 objective were given as 36 lesson hours. The sub topics and contents related to the unit are shown in Table 2.

Data Analysis
In the analysis of the data obtained during the development of the test, for each item, standard deviation, arithmetic mean, item distinctiveness, item difficulty, Kolmogorov-Smirnov test for the normality test, biserial correlation coefficients in item total score correlation and KR-20 reliability coefficient in reliability calculations were used and calculated statistically.

The Study of Validity
While preparing the achievement test for the "Let's Solve the Puzzle of Our Body" unit in the research, at least three test items related to objective were formed. While creating the items, an expert was consulted in order to ensure validity.
For the content validity of the achievement test two faculty members from Ondokuz Mayıs University Science Teaching Department, four doctoral students and four science teachers, totally 10 people were consulted. For each item found in the pilot achievement test, a graded "Expert Evaluation Form" was given to the experts. For each item in this form three grades were given: appropriate, must-be-corrected and must-be-excluded. According to the opinions obtained from the opinion form, the Content Validity Rates (CVR) were calculated for each item (formula 1). CVR is calculated by subtracting one from the division of the number of experts who marked the "required" option to the half of number of total experts (Yurdagül, 2005).
formula 1 NA: The number of experts who are approving the test items as appropriate.

N:
The total number of experts who states opinions related to test items.

CVR: Content Validity Rates.
In Table 3, minimum values of CVR at α=.05 significance level are included for an expert opinion according to Veneziano and Hooper (1997). When interpreted according to this table; 10 expert opinions are used in the content validity calculations of the achievement test questions used in this study, therefore, to provide significance statistically according to expert numbers, for 10 experts 0.62 value was used as the Content Validity Criterion (CVC). In the study conducted, all items from the 46-item achievement test were taken into the application form since no item had a lower value than 0.62 which is the Content Validity Criterion (CVC) for 10 experts. Afterwards, CVRs were collected and the total validity index of the scale was obtained. As a result of the calculations, the Content Validity Index (CVI) of the scale was found to be 0.95 and since CVI≥CVR, the content validity of the scale was found to be significant statistically (Yurdagül, 2005).
For the face validity of the achievement test, a faculty member from the Department of Science Education, a Science teacher and a language expert were consulted and the necessary corrections were made in the direction of incoming feedbacks. According to these feedbacks, some distractors are at a level that students have difficulty in understanding and some questions are very long and there are two negations in the same question. As a result of the expert examination, no item was eliminated and the pilot application was prepared by making the corrections in the direction of suggestions. So as to determine the content validity of the test, the indicator chart which consists of the unit objectives has been prepared and each one of the objectives has at least three items. The indicator chart related to the content validity has been given Table 4.
The pilot application of a total 46 multiple choice test with 22 items related to "Nutrients and their Characteristics", 13 items of "Digestion of Nutrients" and 11 items of "Excretory System in Our Body" included in Let's Solve the Puzzle of Our Body unit was carried out (Table 4).

Normality Test
Kolmogorov-Smirnov which is one of the normality tests was applied to test the suitability of the normal distribution of the data obtained from the achievement test. The fact that the p value calculated as the result of the analysis is higher than .05 is interpreted as the scores do not show any significant (extreme) deviation from the normal distribution at this significance level (Büyüköztürk, 2010). Accordingly, the Kolmogorov-Smirnov test results show that achievement test scores of the students does not show any significant difference from the normal distribution (D(178)=.047; p=.200; p<.05).

The Item Difficulty and Distinctiveness
In the rating of the results obtained from the achievement test, the total score of each student was calculated by giving 1 point to the correct answers and 0 point to the wrong answers, unanswered questions and to those who marked more than one answer for the same question. The test results obtained after rating are ranked from the highest to the lowest. Item analysis was performed by creating groups in a way that the first 27% (N=48) of the score ranking constitute the upper group and the last 27% constitute the lower group and by using Microsoft Excel and SPSS programs for the answers given by the students for each item.
About the levels of item difficulty, it is considered that if the item difficulty index (pj) is between 0.00-0.19 the item is very difficult, if it is between 0.20-0.34 the item is difficult, if it is between 0.35-0.64 the item has medium difficulty, if it is between 0.65-0.79 the item is easy and if it is between 0.80-1.00 the item is very easy (Sözbilir, 2010). In the results of item analysis related to each item in the achievement test, item difficulty index values vary from 0.30 to 0.74.
Item distinctiveness is the comparison of the average of the scores that end groups such as upper and lower groups give each item when they are ranked from the highest to the lowest according to the total scores obtained from the jel.ccsenet.org Journal of Education and Learning Vol. 6, No. 2;2017 scale (Tavşancıl, 2006). As a result of the item analysis related to each item in the achievement test, in choosing to decide which item will remain in the test, as item distinctiveness index (rjx) it is considered that if rjx≤0.19 the item is unacceptable, if it is between 0.20-0.29 the item must be revised, if it is between 0.30-0.39 the item is good/acceptable and if 0.40≤rjx the item is very good/acceptable (Özçelik, 2010). In the achievement test developed as 46-items, arithmetic mean, standard deviation, variance, reliability, item distinctiveness, item difficulty, item correlations calculations were performed. Items whose distinctiveness index is lower than 0. 30 (9 th , 10 th , 11 th , 19 th , 25 th , 36 th , 41 th ) were excluded from the test. However, 7 th item whose distinctiveness index is 0.27 was not excluded because four items in total were created (7 th , 8 th , 9 th , 10 th ) related to the objective it qualifies. Other items (9 th , 10 th ) were eliminated since their distinctiveness were low and 7 th item was not excluded since in case of its elimination there would have been only one item (8 th ) related to the relevant objective. The distinctiveness of the 6 th item is 0.31. However, there are 6 items in total related to the objective it qualifies (1 th , 2 th , 3 th , 4 th , 5 th , 6 th ). For this reason, the elimination of the 6 th item, which has the lowest distinctiveness among these items, was deemed appropriate (Table 5).

The Item Correlation
In item total correlation, biserial correlation coefficient was used. Biserial correlation coefficient is used to calculate the amount of the relationship between a continuous variable and a variable which is actually continuous but was made discontinuous and artificially with two categories (Büyüköztürk, Çokluk, & Köklü, 2010). In this context, there is a relationship between the score obtained from the sum of the achievement test (continuous variable) and the score obtained from each item of the test. Biserial correlation coefficient was calculated for each item in the test by giving 1 point to the correct answers and 0 point to the wrong and unanswered questions.
Item total correlation explains the relationship between the total score that respondents receive from the assessment instrument and the score they receive from each item. The fact that item total correlation is positive and high indicates that scale items show similar behaviour and that internal consistency of the test is high (Büyüköztürk, 2010). If the total score and correlation of any item is low, it indicates that that item scales a different feature than the other items. Item total correlation should not be negative and it must be at least 0.20. When biserial correlation coefficient of each item included in the achievement test was calculated, the correlation coefficient values of the 9 th , 10 th , 11 th , 19 th and 41 th items were found to be below 0.30. These question items are items that were eliminated in the calculation of the item difficulty and distinctiveness made earlier since their values were low. The fact that the correlation between items is high indicates that items are homogeneous and therefore highly reliable (Tavşancıl, 2006). After the item elimination of the achievement test was performed, the distribution of the test items according to the subjects and objectives included in the unit is stated in Table 6.  It searches and provides information on which nutrients have the most vitamins. 6, 7 It deduces that water and minerals are present in all nutrients. 8, 9 It searches and presents the effects of balanced nutrition on human health. 10, 11, 12 It discusses the importance of freshness and naturalness of nutrients for a healthy life based on the research data. 13, 14 It discusses the damage of smoking and alcohol to the body based on the research data. A total of 38 multiple-choice test items, 17 of which are related to "Nutrients and their characteristics", 13 of which are related to "Digestion of nutrients" and 9 of which are related to "Excretory in Our Body" are included in the final achievement test (Table 6).
The arithmetic means and standard deviation values of the items of the test finalized according to the item analysis performed as a result of the pilot application of the pilot achievement test are given in Table 7. As a result of the item analysis of the achievement test, 6 th , 9 th , 10 th , 11 th , 19 th , 25 th , 36 th , 41 th items were excluded and the difficulty and distinctiveness values stated in Table 7 for each of the other 38 items in the test were received. As a result of the item analysis, the distinctiveness of all questions was calculated above 0.30.  Arithmetic mean, standard deviation, variance, difficulty and reliability calculations of 38 items were repeated in the final achievement test (Table 8). In the research, the average difficulty of the pilot and final achievement tests was found to be moderate. The average difficulty of the achievement tests must be 0.50 so that they can serve the feature that is scaled and they can be highly reliable (Kan, 2012).

The Study of Reliability
In the reliability calculation of the achievement test, the KR-20 reliability coefficient was calculated. The KR-20 is suitable for determining the reliability coefficient of tests in which each item in is parallel to each other, which has the same mean and variance and which was scored by giving one point to the correct answers for each question, and not giving any point to the wrong answers or unanswered questions (Baykul, 2010;Tekin, 2000). The reliability coefficient value was calculated as 0.86 as a result of the Kuder Richardson 20 (KR-20) calculation of the pilot achievement test whereas, after the elimination of the eight items as result of the item analysis KR-20 reliability coefficient was calculated as 0.87 (Table 8). An assessment instrument whose KR-20 reliability coefficient is 0.70 or higher is acknowledged as reliable (Fraenkel & Wallen, 2006;Özçelik, 2010;Saipanish, Hiranyatheb, & Lotrakul, 2015). Therefore, this achievement test is considered as reliable. As a result of the item analysis, the achievement test consisting of 38 multiple choice items was finalized and prepared for using in the research.

Conclusion and Suggestions
The assessment and evaluation process is important in terms of assessing the effectiveness of science teaching. One of the frequently used assessment instruments in the assessment and evaluation studies is multiple-choice achievement tests. Today multiple choice tests are one of the most widely used assessment instruments which allows comprehensive assessment of achievement and easy scoring for the practitioner by providing many questions in a short period of time (Burton, Sudweeks, Merrill, & Wood, 1991;Bağcan Büyükturan & Çıkrıkçı Demirtaşlı, 2012;Treagust, 1988). Test is an assessment instrument easy to apply and score in the assessment and evaluation process since it consist of multiple choice items. For this reason, the aim of this study is to develop a reliable and valid assessment instrument which can assess the achievement of students related to the fifth grade science course "Let's Solve the Puzzle of Our Body" unit.
In the process of developing the test, firstly the pilot application of the test and test and item analysis were performed. As a result of the item analysis of the achievement test consisting of 46 items in total, the final test consisting of 38 items was created by eliminating 8 items. A table of specifications showing the relationship between the test items created in terms of content validity and the objectives included in the Ministry of National Education Science Curriculum was prepared. In addition, the Content Validity Index (CVI) of the test was calculated to be 0.95 by taking expert opinions for each item. As a result of the item analyses carried out during the test development process; item difficulties were calculated between 0.30-0.74, item distinctiveness index were calculated between 0.31-0.71, and item-total score biserial correlation coefficients were calculated between 0.30-0.66. While calculating the KR-2 reliability coefficient of the final test, the average difficulty of the test was found to be moderate and its average distinctiveness was found to be very good. The results show that the achievement test is reliable and valid in terms of evaluating the academic achievements of the fifth grade students related to the "Let's Solve the Puzzle of Our Body" unit.
When the literature on digestive system, excretory system, nutrients and nutrient types are examined, it is seen that college students carried out studies on 6 th and 7 th grade students at secondary school (Alkan Dilbaz, 2013;Güçlüer, 2012;Güngör & Özgür, 2009;Patrick & Tunnicliffe, 2010;Prokop & Faněoviěová, 2006;Yıldırım, 2012). This test will enable Piaget to identify the deficiencies in knowledge in the biology field during the transitional period of the 5 th grade students transitioning from the concrete process period to the abstract process period.
It is believed that this assessment instrument can help to determine the readiness level of the 5 th grade students and their lack of knowledge in subtopics and that it can help the scientific studies of the researchers conducting experimental research. In the direction of the results obtained from this research, the following suggestions have been made: -With this developed achievement test, level and lack of knowledge of the students in "Nutrients and Their Characteristics", "Digestion of Nutrients" and "Excretory System in Our Body" subtopics included in "Let's Solve the Puzzle of Our Body" unit can be determined in the transition period of the students to secondary school.
-The developed achievement test can help students to organize their learning activities according to their determined deficiencies by determining their readiness and deficiencies in terms of the 5 th grade biology subjects.
-With the developed achievement test, it is possible to determine the misconceptions in the students by examining their level of knowledge and deficiencies as well as the questions they answered wrong. Because jel.ccsenet.org Journal of Education and Learning Vol. 6, No. 2;2017 the distracters of each item in the test were prepared according to the misconceptions that students have in relation to the topic.
-The developed achievement test can be used as a data collection tool for other researches to be carried out in the field of science education.

Appendix A The Academic Achievement Test of Let's Solve the Our Body Puzzle Unit Question 1)
Which of these nutrition below are the most fuel nutrients in comparison to the others?
Question 2) "Selin had cheese, bread and honey on breakfast; pasta on her lunch; French fries and rice on dinner." Regarding the nutriment that Selin ate for a day, try to find out what kind of food Selin takes in her body excessively?
A. Carbohydrate

Question 3)
Which nutriment group is most important as regulator in our body?
A. Protein

Question 4)
What kinds of nutrient do we get from the energy that our body needs primarily to think, talk, walk, play sports, and so on?
A. Protein

Question 5)
Think about the nutrient groups that found excessively on animal nutrient such as meat, milk, egg, fish and cheese.
Which of these below is not one of the primary duties of the nutrient group you think about? A

Question 10)
The doctor asks Betül, who is sick, what she eats.

Doctor:
-"If you keep eating like that, you are going to get a skin breakdown and feel fatigue" What is the doctor trying to say Betül basically?
A. She should have a vitamin-based diet instead of carbohydrate B. She should have a protein-based diet C. She consumes too much vitamin D. To eat one type of food is unhealthy.

Question 11)
Which of these below does not belong to a person who eats properly? Which of the statements below is not one of the harms of alcohol?
A. It weakens the will be negatively affecting the nervous system B. It affects the brain, muscles and veins adversely C. Makes you sleepy and brings an order to sleeping pattern D. It makes it hard to control the behaviours and senses

Question 17)
Which of the following is the body and structure that alcohol affects the most, negatively?

Question 18)
Which one of the following is the route of food in the digestive system after stomach?

Question 19)
Which of the following are physically disintegrating foods in the digestive system in the human body and the remaining waste after digestion is thrown out?

III. IV.
Which of the organs given above are the organs in charge of digestion?
A. I-II

Question 21)
Which of the following is the largest number of teeth in an adult individual who can crush and grind food? Which of the following is the part of the body by which food that is digested in our bodies, water, vitamins and minerals absorbed into the circulation system?
A. Kidney

Question 29)
After the digested food becomes shattered and absorbed, in which way are the beneficial parts carried on the body?
A. Passes to stomach to get reabsorbed B. It spreads throughout the body through the large intestine.
C. It is transported through the liver to the body D. It spreads to the whole body with blood circulation

Question 30)
Which one of the following is the system which helps the nutrients mix with blood and the organ where water and mineral are absorbed?
A. Circulation -Small intestine B. Urinary -large intestine C. Urinary -Kidney

Question 31)
Which of the following is the ureter's duty?
A. The short pipe that the urine is thrown out B. Place where the blood is filtered C. The place where the urine is collected D. The conduit carrying the urine from the kidneys to the urine Question 32) Which of the following does not play a major role in disposing waste and residual substances in the body?
A. Sweating

Question 33)
Which of the following is not an organ that helps to remove waste from your body?
A. Stomach

Question 34) "Urea -Oxygen -Sweat -Carbon Dioxide -Urine"
How many of the above are waste materials that are formed in the human body?

Question 35)
Which of the following should not be done for the health of the drainage system?