Cross-Cultural Adaptation of Developmental Criteria for Young Children : A Preliminary Psychometric Study

Authentic assessment approach applies naturalistic observation method to gather and analyse data about children’s development that are socio-culturally appropriate to plan for individual teaching and learning needs. This article discusses the process of adapting an authentic developmental instrument for children of 3-6 years old. The instrument consists of 217 criteria of development for children between the ages of 36-72 months; grouped under six domains, which are fine motor, gross motor, adaptive, cognitive, socio-communication, and social. It is a criterion measurement tool, which was developed for the American context. This instrument needed to be adapted into the Malay socio-cultural context before it could be applied in local setting. The adaptation process involved directly translating the items; investigating the items/criteria’s score format; examining the items by a panel of experts; observing the real setting to investigate the score patterns and calculating observer agreement index. 103 children from the Malay ethnic group aged between 36-72 months, six field experts, and twelve observers were involved as participants. The researcher and an editor translated all the criteria for development; novice observers carried out a pilot study to test the suitability of score format; six children’s specialists examined the translated criteria; and lastly, the researcher observed activities in the preschool setting to score the criteria in naturalistic manner. The translated criteria, checklists; and developmental scores were analysed through visual and descriptive statistics. Content analyses showed that most of the developmental criteria were suitable to be applied in the research context. However, there are a few criteria considered as not appropriate and scores between observers indicated low agreement on how they interpreted the criteria.


Problem Statement
This project is a part of a doctoral thesis investigating the implementation of authentic assessment in a university-based early childhood centre.It is funded partly by a government-funded grant to study the adaptation of authentic assessment for children 0-6 years.At the time of research, the centre where the study was carried out was undergoing a transformation after being handed over to the Education Faculty.One of the vital changes was to shift practice from academic orientation towards a more developmentally appropriate practice.Implementing authentic assessment at the centre was an initiative through action research; a collaborative work between researchers, field experts, teachers/caregivers, and administrators.In order to implement authentic assessment procedure, a localized instrument was searched for but to no avail.Authentic assessment instrument was not available in the local context and a highly reliable instrument needed to be adapted from other socio-cultural context before the authentic assessment procedure could be implemented in early childhood setting.The instrument chosen was an authentic, curriculum-based assessment (Bricker et al., 2002) that was originally developed for children in the Unites States of America.International Testing Committee (ITC, 2010) recommends that a comprehensive adaptation process must be applied before any tool is to be applied in a new socio-cultural context.This is vital because the results or interpretation from the tests could have adverse effects on the child if item/criterion is not appropriate for the child's natural development in a given living situations/conditions or traditions.From the developmental psychology field (Cole, 2005), it is agreed that culture plays important roles in child's development and universal developmental milestones may not apply to all children.Therefore, an appropriate approach needed to be studied on how to adapt the criteria of development and to report the psychometric properties of the instrument.

Importance of Study
Authentic assessment is highly recommended by early childhood experts and researchers around the world.It is also recommended in the National Preschool Curriculum (Ministry of Education, Malaysia 2002, 2009, 2010a, 2010b) in Malaysia.Authentic assessment can be utilized to assess young children's developmental milestones and hence it can assist in the process of identifying developmental delay and/or disabilities.Early identification can be beneficial in reducing the risks of a child being learning disabled at later/school age (Bricker et al., 2002;Grisham-Brown & Pretti-Frontczak, 2011;Grisham-Brown et al., 2006).However, a comprehensive and validated instrument developed specifically for local context was not available to be applied in early childhood centre.This research was to study the appropriateness of applying authentic assessment tool from other socio-cultural context and thus, to explore some of the psychometric properties of the developmental criteria of the tool.The tool has been claimed by the authors as to have been validated and its psychometric studies shown high reliability indices (Bricker et al., 2002).

Authentic Assessment
Authentic assessment refers to assessment that is carried out in naturalistic settings.It also means that observation carried out by an observer who is familiar to the child, systematically, without any interference to the activities and routines of the child.This is done to ensure that data collected are diverse and 'true' in order for teachers to plan appropriate teaching-learning sessions (Bagnato et al., 2010;Neisworth & Bagnato, 2004;Nutbrown, 2006).Furthermore, authentic assessment procedure must be contextual and socio-culturally appropriate to ensure fairness (Morrison, 2006;2011).Collaboration is an element of authentic assessment that is practiced during data collection of child development, diagnosis, planning for teaching or intervention, and whole program evaluation (Grisham-Brown & Pretti-Frontczak, 2011).This is important because child development and learning always involve family, teachers, experts and other community members (Bricker et al., 2002).

Developmental and Culture
Modernism in the 18th century had an impact on the education system that it turned the later into a uniform, and stable institution, accepted by majority, and content knowledge guarded closely by the authority in order to transfer knowledge from older generation to the younger ones (Dahlberg et al., 1999;Riley, 2007).However, with the emerging of post-modernism era, those practices were shaken because post-modernists rejected the idea of rigidity and that knowledge should be developed together, socially and equally by the members of the society.Cole (2005); Gonzalez-Mena (2005); Robinson and Diaz (2006); Papatheodorou and Moyles (2012), agreed that diversity in human life should be given a priority in building a curriculum, which is aligned with the UNCRC's statements (UNICEF 2001).Cole (2005) explains in his theory related to cultural element that there are three theoretical perspectives in considering the cultural influence on the child development.These are biology-maturity, environment-learning, interaction, and culture-context.Cole also suggests that culture is a behaviour that is followed or acquired from previous ancestors.However, the mechanism on how culture affects development in a particular group of people is a very complex issue especially, given the globalization of today's world.

Curriculum-Based Assessment
AEPS ® is a curriculum-based assessment instrument that is categorized under criterion-referenced measurement.It links assessment, planning and teaching/intervention, and evaluation of program continuously.Literally, AEPS is the acronym for Assessment, Evaluation, and Programming System for Infants and Children (Bricker et al., 2002).There are six domains in AEPS ® : 3-6, which are fine motor, gross motor, adaptive, cognitive, socio-communication, and social.Each domain consists of a few strands that divide into specific behaviour or skills.In each strand, there are a few goals and these then separate into a few objectives.Criteria for goals and objectives are explained in details along with a few examples of children's activities.For the purpose of observation, only the objective and goal are scored for between 0, 1, and 2 figures.There are 21 strands, 54 goals, and 163 objectives.The total number of goals and objectives is 217 and is shown in table 1; and the simplified explanation version of these is displayed in Appendix.Research on AEPS ® : 3-6 had begun in the 1980's and is still going on until this day (Bricker et al., 2002).In the year 1986, Slentz as discussed in Bricker et al. (2002), carried out an inter-rater agreement, concurrent validity, and relationship between domain score and overall score.Others like Hsia (Bricker et al., 2002) also investigated the inter-rater agreement, reliability index and sensitivity.Treatment validity was researched by Bricker & Pretti-Frontczak beginning year 1997. Pretti-Frontczak & Bricker (2000) found that goal and objective written assignment by teachers had improved after a training session of AEPS ® :3-6.This had shed a light on the teachers' part because it could save a lot their time when they could plan, teach, and assess continuously.That is because assessment gives them direct link to the curriculum, which is the main feature of curriculum-based assessment.Furthermore, results from study also showed that this instrument could be utilized as an eligibility tool for young children who need to get under the intervention services program in the United States (Macy et al., 2007) and Bricker et al. (2002).

Cross-Cultural Adaptation Procedure
The general guidelines for cross-cultural adaptation process recommended by the International Test Commission outlined four main categories (ITC, 2010) and (Hambleton, 2005) which are i) context, ii) development and adaption of tests, iii) administration, and iv)documentation and interpretation of score.In short, translators need to stay unbiased; the evidence for the suitability of language, linguistic, culture, and statistical analyses must be recorded; equivalence between original and adapted tests must be established; administrators of tests must ensure the right setting; and all details about the changes in new adapted version must be documented.

Purpose and Questions
The purpose of this study was to investigate the appropriateness of applying an authentic assessment instrument, which was first developed for other culture into the Malay socio-cultural context.The study was divided into four phases and thus the questions are discussed separately into each phase as follows: 1.4.1 Phase 1-Research Questions Is AEPS®:3-6 appropriate in terms of language and socio-culture in order to be applied in the setting?
1) Which criteria of developmental domain can be translated directly and remained unchanged?
2) Which criteria of developmental domain has to be changed, modified or replaced?
3) Which criteria of developmental domain has to be eliminated and/or redeveloped?
The question was based on the observation made by observers on the suitability of the score format and is as follows: 1) What is the criterion that could not be scored due to inability to interpret the scoring figure?

Phase 3-Research Questions
The following are the questions that the expert panel had to answer when they reviewed the criteria in the instrument.
1) Are the translated criteria difficult to understand or confusing?
2) Are the translated criteria familiar to the Malay socio-cultural context?
3) Are the translated criteria referring to the appropriate skills of Malay children between 3-6 years?
4) Are the translated criteria arranged in the hierarchical order of skills?5) Are the translated criteria retained its meaning as similar to that of original instrument?

Phase 4-Research Questions
The main question in this phase is about the index of reliability and specifically is the index of inter-rater agreement.
1) What is the inter-rater agreement of observers?

Conceptual Framework
At the beginning of this research, no literature could be found on the topic about adapting criterion-referenced measurement or authentic assessment.Based on the knowledge about criterion-referenced measurement, cross-cultural adaptation recommended by International Testing Committee (ITC, 2010), and authentic assessment procedure itself, the author developed the framework for the study.Expert panel review and inter-rater agreement are the priorities in this procedure.Figure 1 shows the conceptual framework.
Furthermore, there was almost no literature to be found about the nature of the Malay children development.The researcher had discussed this issue with the panel of experts at the Medical Centre, Universiti Kebangsaan Malaysia, and the experts had no idea about an instrument or any study of that nature.During the discussions, the researchers developed the idea that child experts at hospitals and clinics in Malaysia have been trained to apply instruments which were first, developed in the developed nations and for use in that particular nations.Consequently, specialists, experts and medical personnel only know that kind of instrument and it never across their mind (and that they do not have the time and expertise) that those instruments should be adapted before they could be applied to the local context.In addition, they had never been introduced to an instrument that is highly reliable and valid and is specifically meant to reflect the Malay culture.Thus, finding literature to support the Malay views about development was not fruitful.In addition to explain the situation, it is well-known that textbooks on child development for use in the tertiary education level in Malaysia, majority are bought from developed nations of different cultures; and that phenomenon is addressed here in the study.

Method
There are four phases in this particular adaptation research which are i) translation, ii) suitability of score format, iii) expert review, and iv) observers' agreement.The adaptation processes were carried out in phases which are explained in the next paragraphs.

Phase 1-Translation of Criteria
The translation was carried out by the author and the translated criteria were then edited by a Master holder degree of Malay Language.The criteria were first typed into the Word ® document and then translation was done domain by domain.After the literary translation, the criteria were examined for its socio-cultural appropriateness through discussions with several colleague and observations made by the researcher at early year's settings.

Phase 2-Suitability of Scoring Format
Once the translation was done, three observers had the instrument studied for the suitability of score format.The students who were undergoing their undergraduate special education courses of the Education Faculty observed three children between 3-6 years and tried to score the instrument.Their scoring patterns should be able to indicate whether 0, 1, and 2 format was suitable for use by local observers.

Phase 3-Expert Review
Each expert reviewed the criteria in order to ensure the appropriateness of the criteria from the perspectives of child/developmental experts.They were each given a checklist of AEPS ® [M]:3-6 which is a dual-language checklist of Yes/No.They identified any criterion that was not considered suitable for the local context.

Phase 4-Observation and Scoring
After the review from the expert panel, three observers set out to the field-an early childhood setting to observe the children's developmental criteria and to score the criteria in naturalistic environment.

Instrument
AEPS ® : 3-6 that is a curriculum-based, and also known as a criterion-referenced measurement is the main instrument being utilized in this project.

Participants
There are a few types of participants involved in this study and categorized into translator, editor, observers, expert panel, and children.The translator is a bilingual who speaks and writes in both Malay and English languages.Meanwhile, the editor was a Master degree holder specialising in Malay Language and a graduate from a local university.Observers for the scoring format study were undergraduate students who had their training under the special needs education.Three children were involved during the scoring format study who were in a childcare centre and at home looked by either caregivers or parents.Expert panel were medical and child experts from a local university-based hospital who were invited to be participants.Most participants speak and write both Malay and English languages.Observers for naturalistic observation in the setting were selected from the Master holders of Social Sciences Faculty from a local university.Children who were involved in the last phase of this study mainly were the children of the university's staff.A total of 100 children were involved and they were observed in their natural daily activities and routines.Table 2 shows a list of participants and their participation type.Parents/family was given a form to fill in their demographic background information and to indicate whether they allow the child to be involved in the study or otherwise.The researcher had also asked for permission to take photos and videos for observation and assessment/scoring purposes.

Data Collection Procedure
Naturalistic observation was the main idea behind authentic assessment procedure and thus, it was applied during almost the entire process.Data gathered from observations were either directly scored in the Observation Data Recording Form AEPS ® :3-6 or being transferred into checklists.

Data Analysis Procedure
Data were analysed separately in each phase and various kinds of analyses were utilized during the processes as shown in table 3. Data (translated and edited text) collected in the first phase, second phase (score patterns), and third phase (checklists), and the last one (score 0, 1, 2) were analysed using descriptive statistic; with addition of an observer agreement index being calculated in the fourth phase.

Results
Results are categorized into phases and are discussed in the next paragraphs.

Phase 1
There were a few criteria in the domains that could be categorized into two categories, which are either ethnic-bound or universal cultures.Ethnic-bound mostly found in AEPS: 3-6 are adaptive and cognitive; otherwise are universal culture.Adaptive domain criteria e.g.preparing table for meal, using eating utensils, and using paper tissue during toileting deemed to be a little different in the way they are practiced in the local context.The hot and humid weather also affects the way children put on and take off clothes.Language and linguistic had also found to be not suitable and some of the criteria need to be redeveloped.Universal culture was translated directly without many changes to the criteria, and ethnic-bound criteria were modified or changed into criteria that are more suitable.All Strand B in Socio-communication domain had its criteria eliminated and needs to be developed in future study.Table 4 shows the summarized categorized criteria.

Phase 2
Observers did not find it difficult to interpret the score format and they all could administer the instrument quite easily.Although we cannot assume that the scores were accurate, but we could still conclude that 0, 1, and 2 figures did not pose any trouble for the observers when scoring.Table 5 shows the summary of the scores.

Phase 3
Findings from the analyses showed that expert panel mostly agreed that the criteria were appropriate for the local context with little modifications in the sentence structure.Other than that, they agreed that the criteria needed no major changes in the hierarchy or developmental aspect.The data were categorized according to the research questions and are displayed in Table 6.
Table 6.Phase 3-analyses summary.Data analysis for expert review 3.4 Phase 4

Score of All Domains
Overall score for the domains ranged from 5,000 to 6,300(see table 7).Score for 4 year olds were the highest among the three groups and it indicates that observers might not be well equipped with knowledge and training for observing children in naturalistic setting and thus, interpretations between them might vary greatly.I n figure 2, we could see that the scores were almost visually the same for all groups.Inter-rater agreement would be able to tell whether the observers had the same level of interpretations or not when they were observing, the criteria in the instrument.This is discussed in the next sub-section.Inter-rater reliability is formulated as: From the calculation, it was found that the index is about 0.171521 or 0.17.This figure is too low to be considered as reliable.Therefore, we can conclude that the observers were not in agreement among themselves and this may due to many factors, which are discussed in the next section.

Discussion
The author believes that the-almost-none developmental instrument of highly reliable and valid developed specifically for the local context is the main reason for little comments from the expert panel.During the informal discussions with the experts at the hospital, it was learned that they had been trained to apply developmental norms, which are from the western and more developed country.This research poses an outcry for more research to be carried out to investigate about local children's developmental milestones and for experts to reduce their dependency on western culture.If this is not done in the near future, many children will be misdiagnosed or not receiving necessary interventions.Alternatively, it could simply mean that teachers do not practice appropriately to serve local people.
Index for inter-rater agreement was found very low and this could be due to the fact that observers were not trained on how to observe children naturalistically, and it could also mean that they had little training on developmental milestones of young children on practical manner.From the findings, the author concluded that intensive training is vital for the score and interpretations of the score could be deemed as reliable.Teacher training on how to create an effective learning environment could also be another vital factor to be improved since developmental milestones of young children must be observed naturally so does the learning environment-physical and social.
The current research focuses on the content validity, which mainly were involved around translation, expert review and one-off observation.Therefore, future studies must focus on the more complex data collection and statistical analyses in order for the instrument to be established as reliable and valid for use in local context.Lastly, collaboration is an element for implementing authentic assessment procedures, and thus in the future it is suggested that family and other professionals be involved during the whole study.

Figure 2 .
Figure 2. Score distribution for 0-6 years for overall domains

Table 1 .
List of items or criteria in the curriculum-based instrument

Table 2 .
List of participants

Table 3 .
Type of data and analysis involved in the four phases