A Psychometric Analysis of Reliability and Validity of the Index of Learning Styles (ILS)

Prior literature showed that Felder and Silverman learning styles model (FSLSM) was widely adopted to cater to individual styles of learners whether in traditional or Technology Enhanced Learning (TEL). In order to infer this model, the Index of Learning Styles (ILS) instrument was proposed. This research aims to analyse the soundness of this instrument in an Arabic sample. Data were integrated from different courses and years. A total of 259 engineering students participated voluntarily in the study. The reliability was analysed by applying internal construct reliability, inter-scale correlation, and total item correlation. The construct validity was also considered by running factor analysis. The overall results indicated that the reliability and validity of perception and input dimensions were moderately supported, whereas processing and understanding dimensions showed low internal-construct consistency and their items were weakly loaded in the associated constructs. Generally, the instrument needs further effort to improve its soundness. However, considering the consistency of the produced results of engineering students irrespective of cross-cultural differences, it can be adopted to diagnose learning styles.


Introduction
Psychologists have proposed several learning style models to meet individual needs of learners, connect teaching and learning styles, and avoid using a "one-size-fits-all" teaching approach (Felder & Silverman, 1988;Kolb, 1984;Riding & Cheema, 1991). Learning styles was defined as "characteristic strengths and preferences in the ways they 'learner' take in and process information" (Felder, 1996). The main assumption of learning styles theory is that ignoring individual styles may lead to dropping a course, learner dissatisfaction, and low achievement (Felder & Brent, 2005). On the other hand, empirical studies have not produced conclusive evidence either to confirm or to refute the value of learning styles (Al-Azawei & Lundqvist, 2015;Mayer, 2011). Furthermore, learning styles research is limited by the absence of a valid and reliable measurement to identify this psychological trait (Coffield, Moseley, Hall, & Ecclestone, 2004).
 The understanding (sequential/global) dimension represents information understanding. The first pole (sequential) means that a learner uses a step by step learning method and cares about all details. Another pole (global), in contrast, includes a learner who tends to make leaps in studying to understand the general picture before looking to details.
In order to infer learning styles in accordance with this model, Felder and Soloman (n.d.) proposed the Index of Learning Styles (ILS). However, there are contradictory findings regarding the reliability and validity of this instrument. Additionally, our systematic review did not show that the appropriateness of the ILS was proven in an Arabic engineering population. Hence, this research pursues two aims. Firstly, it contributes to the debate with regard to the soundness of the ILS. Secondly, it aims to overcome the limitation and scarcity in the existing evidence about the appropriateness of the instrument for inferring learning styles in an Arabic engineering sample. As such, we hypothesised that: 1) The ILS is a reliable and valid instrument to infer learning styles in Arabic engineering students.
2) The reliability and validity of some dimensions of the ILS will not be supported to infer learning styles in Arabic engineering students.
The research is structured as follows. Section 2 highlights the findings of related work. Subsequently, Section 3 introduces the methodology of this research. Section 4 illustrates the core results of the study and discusses the findings. Finally, Section 5 concludes the research and identifies the potential future work.

Literature Review
Recent studies suggest deducing learning styles implicitly in order to avoid the critique with regard to the reliability and validity of psychometric instruments and direct intervention of users (Graf, 2007;Latham, Crockett, & McLean, 2014). However, to evaluate the accuracy of obtaining results or to initialise student models in such a way require the use of an instrument. Hence, one important issue that has to be addressed is the reliability and validity of learning style measurements and this should be the first step. Reliability means how a scale is free from random error. Construct-internal consistency is widely used to measure reliability. It means that all items that make up the scale should measure "the same underlying attribute". Cronbach's coefficient alpha represents the most commonly applied indicator to measure internal-consistency. The nature and the purpose of the scale represent the base of adopting the level of reliability. Generally, equal to or above 0.7 is recommended as a minimum level of Cronbach's alpha (Pallant, 2013). However, other researchers suggests that this level was recommended for achievement tests, whereas above 0.5 level is an acceptable for attitude tests (Tuckman, 1999as cited in Zywno (2003). In this research the recommendation of Tuckman was adopted because it analyses a psychological instrument. Validity, on the other hand, expresses how well an instrument measures the attributes it is designed to measure (Pallant, 2013). Van Zwanenberg, Wilkinson and Anderson (2000) compared the ILS and Honey and Mumford's Learning Styles Questionnaire (LSQ). A total of 284 engineers 139 and managers 145 participated in the study. A standard minimum of alpha of 0.80 was adopted. Accordingly, the reliability of the ILS was not supported in the study. However, by considering the alpha of 0.5 is acceptable for attitude tests, the internal consistency of the ILS can be advocated except for the sequential/global dimension because the alpha was 0.41. In addition, a significant correlation was found between sensing/intuitive and sequential/global dimensions.
In Zywno (2003), a total of 557 undergraduate students at Ryerson University, Toronto, Canada participated to contribute to the validation of the ILS. In order to prove the reliability of the instrument, test-retest, Cronbach's alpha and factor analysis tests were used. Results showed an acceptable level of reliability by adopting a minimum alpha of 0.50. Furthermore, because of the integration of data from several academic years (2000 to 2002), ANOVA tests were applied, revealing that there was no statistical difference in the mean of the scales in the consecutive years. A comparison with the results of other studies indicated that the most dominant styles of the engineering population in different countries are active, sensing, visual and sequential. This leads to support the convergent validity of the ILS. However, more investigation was recommended. Felder and Spurlin (2005) surveyed many studies that dealt with the reliability and validity of the ILS. According to their conclusion, the questionnaire was proven for engineering samples in four countries and ten universities based on English population. The test-retest results were from 0.7 to 0.90 in all investigated studies and the Cronbach's alphas were acceptable based on the criterion value of 0.50. Cook and Smith (2006) compared between four instruments of learning styles in terms of reliability and validity. One of these instruments was the ILS. The participants were residents and medical students 89. Overall, the active/reflective and sensing/intuitive dimensions were validated. Furthermore, the test-retest reliability showed www.ccsenet.org/ijps International Journal of Psychological Studies Vol. 7, No. 3; that all dimensions are either at a good or an acceptable level of reliability. Participants were also requested to indicate their level of agreement regarding the identified styles. The responses revealed their satisfaction with the inferred styles as well as the ease of using the questionnaire. Thus, more support to the instrument was provided. However, a significant correlation was found between sensing/intuitive and sequential/global dimensions. This leads to suggesting further research. Platsidou and Metallidou (2009) investigated the reliability and validity of two learning styles instruments, the ILS and the Kolb's Learning Styles Inventory (LSI). The instruments were distributed to a total of 340 undergraduate students in Greek university from different disciplines. Researchers concluded that the internal consistency of most of the learning styles dimensions were poor or moderate.
To examine the suitability of the ILS for undergraduate medical students, a total of 358 participants filled in the ILS from 2002 to 2007 (Hosford & Siders, 2010). Generally, the results supported the internal consistency of the measurement and the test-retest analyses showed moderate to high stability for the difference of administrations of two and four years. As with other studies, a significant correlation between perceiving and understanding dimensions was shown. Table 1 summarises the results of the above discussed studies.
To sum up, most of the related works recommend more investigation of the reliability and validity of this instrument. Further, to the best of the authors' knowledge, it has not confirmed with an Arabic engineering population. Only the study of Cook and Smith (2006) incorporated 2 Arabic students which represent approximately 2% of the total sample. Hence, this study contributes to provide evidence about the validity and reliability of this instrument for such population.

Methodology
Ethics approval was sought, and obtained, using the procedures laid down by the Ethics Committee at the University of Reading.

Participants
The ILS questionnaires were deployed at the University of Babylon in Iraq and in an online course. The participation of students was voluntary and some lecturers encouraged their students to participate in the form of extra marks. Data were collected from different courses at the College of Information Technology during 2013-2014, 2014-2015 academic years and from learners who participated in a voluntary online course. All subjects gave consent to participate in the study. The number of females is 136 (52.5 %), whereas the number of males is 123 (47.5 %). The age group of the majority of them ranged from 18 to 22.

Instrument
The ILS is a free questionnaire to measure learning styles in accordance with Felder and Silverman model (Felder & Silverman, 1988). The instrument was proposed specifically for engineering population. It consists of 44 forced-choice questions where students have to choose either (a) or (b), for example: 1). I understand something better after I (a) try it out.
In this research, authors provided a general explanation about the aim of the study, guaranteeing that all personal information will confidentially be manipulated, and data will be used for the purpose of research only. The instrument consists of the following parts:  Demographic information: The first part identifies demographic information of participants.
 The ILS: The second part includes the ILS items. All questions were translated into Arabic language in order to simplify understanding of questions. The translation was confirmed by two experts.  A closed-ended question: a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree) question was used to identify the ease of the questionnaire and the clarity of questions. The question is "I found that it is easy to understand questions and answer them".
The instrument was administered online to facilitate data collection. In order to avoid missing values and receiving invalid questionnaires, all items were identified as required. Therefore, students cannot submit their response until filling them all in. The obtained scores of the ILS showed that the commonly adopted styles were active (N = 173, 66.8 %), sensing (N = 198, 76.44 %), visual (N = 206, 79.53 %), and sequential (N = 156, 60.23 %).

Procedure
The questionnaire was administrated via the announcement page of Moodle and email during different academic years. The administrations were in February 2014 for first year students in a Fundamentals of Programming Language (FPL) course, December 2014 for students in an online Web Design course and finally, in March 2015 for all students in the college of Information Technology including some students who participated in the FPL and online courses to conduct test-retest reliability. However, only 19 students have participated twice. This leads to the exclusion of test-retest analysis from the study.
Van Zwanenberg et al. (2000) stated that one of the issues of dichotomous nature of scales is the difficulty of use standard statistic tests. Hence, they suggested assigning a value of 1 to (a) questions and 0 to (b) questions. This binary approach was adopted in our study. Means of the opposite poles can be computed as a complement of 11, for example, if the mean value of visual pole is 7.5, the mean value of verbal pole is 3.5.

Analysis Techniques
Statistical analysis was conducted by using SPSS (Statistic Package for Social Science) version 22 for Windows 7. Several analyses were applied to include descriptive and inferential statistics. These comprise means (M), standard deviation (SD), frequency, Pearson's coefficient correlation, inter-scale correlation, total item correlation, MANOVA test, Cronbach's alpha and factor analysis.

Results and Discussion
Between January 2014 and March 2015, 259 engineering students consented to participate in this study. The investigation did not show any statistical significance between gender and three dimensions of learning styles as www.ccsenet.org/ijps International Journal of Psychological Studies Vol. 7, No. 3; presented in Table 2. A significant correlation was found only in the perception dimension. Female students were more likely to prefer sensing style rather than intuitive (81.61%).
Further analysis was conducted to reveal the ease of understanding the 44 items by accounting the mean value of the ended-closed question. The mean score was 5.15 to indicate a good level of overall understanding and ease of answering the questions. The clarity is a very important factor to succeed using such instruments because students are unwilling to continue answering unclear or ambiguous questions.

Internal Consistency
The internal consistency reliability was conducted for all "a" items. Two dimensions (sensing/intuitive and visual/verbal) met the minimum acceptable level of Cronbach's alpha of 0.50. This result is in agreement with the findings of literature, for example, Platsidou and Metallidou (2009); Van Zwanenberg et al. (2000) and Zywno (2003). However, the internal consistency of active/reflective and sequential/global dimensions was not supported. This cannot be accounted to the translation of the instrument because it was carefully translated to keep the same meaning of the English version and approved by two experts. This interpretation can be advocated by comparing the overall result with prior literature to show that it was symmetric. In the study of Platsidou and Metallidou (2009), the alpha values of these dichotomies were 0.45, which is consistent with our study. Literature showed, as in this research, that the sensing/intuitive and visual/verbal dimensions achieved the highest internal consistency.
The alpha values of deleted items were also investigated to determine whether the internal consistency may be improved if some items are deleted. Item 17 was the only one which can affect the alpha value of active/reflective dimension to be 0.45 instead of 0.41. However, the ILS was designed to force learners to fall in one of the bipolar and if both options seem to apply to their preferences, they have to choose "the one that applies more frequently". This design does not allow choosing a zero preference. As such, it cannot be recommended to delete an item. However, it can be revised to improve the psychometric properties of the instrument. Table 3 depicts the Cronbach's alpha values and compares them with related works.

Inter-Scale Correlation
For further investigation, the Pearson's coefficient correlation was used to test the inter-scale correlation (Table  4). Although results showed a significant correlation among dimensions, it was mild and the strongest moderate correlation is between sensing/intuitive and sequential/global (r = 0.414). This result is consistent with the findings of Van Zwanenberg et al. (2000) and Cook and Smith (2006). Such correlation would seem obvious because the "sensing" and "sequential" learner ends of those spectra rely on small discrete steps and quantised data/knowledge. *. Correlation is significant at the 0.05 level (2-tailed).

Construct Validity
Factor analysis was conducted to test the construct validity of the instrument. According to Pallant (2013), two criteria have to be considered in order to ensure the suitability of data for factor analysis test:  Sample size: 150 cases are identified as the smallest sample size to conduct this test. However, there is a correlation between number of cases and items. Hence, 5 cases were recommended for each item (variable).  The strength of the relationship among items: the correlation matrix should show at least 0.3 correlations between some items. Bartlett's test of sphericity and the Kaiser-Meyer-Olkin (KMO) are two statistical measures to assess the factorability of the data. The former test should be significant at p less or equal 0.05. The latter ranges from 0 to 1 and it should be at least 0.6 as a minimum value.
Our data met both criteria because more than 5 cases were available for each item (44 items and 259 cases). Furthermore, the KMO was 0.615 and the Bartlett's test of sphericity was significant at p less than 0.001 level. Thus, the factorability of the correlation matrix was supported.
Although the Principle Components Analysis (PCA) showed the presence of 17 factors to explain 61.91% of variance, 8 factors, 5 factors, and 4 factors models were examined. The 8 and 5 factors explained 37.69% and 27% of the total variance respectively, but they did not adequately fit. Factors 1, 2, 3 and 4 loaded the most items of the four scales, whereas other factors indicated an overlap between dimensions to show a weak effect on the model. The "scree plot" of eigenvalues as depicted in Figure 1 was also used. It clearly depicts a smooth decrease in eigenvalues after factor 4. The 4 factors model showed that eight items of sensing/intuitive dimension and nine items of visual/verbal dimension were loaded greater than 0.3 on factors 1 and 2 respectively. This confirms the validity of these two dimensions and supports the literature (Hosford & Siders, 2010;Litzinger, Lee, Wise, & Felder, 2005;Platsidou & Metallidou, 2009;Zywno, 2003). Even though seven items of active/reflective and ten items of sequential/global dimensions were loaded on the third and fourth factors respectively, only five and three items were loaded greater than 0.3 for both dimensions. Factor 4 loaded one item of active/reflective more highly than in factor 3, whereas six items were loaded in factors 1 and 3 of sequential/global more highly than in factor 4. The finding is not in agreement with some literature, for instance, in Hosford and Siders (2010) factors 3 and 4 loaded nine items greater than 0.3 of understanding and processing dimensions respectively. In Zywno (2003), factors 3 and 5 loaded seven and six items of processing and understanding dimensions greater than 0.3. This means that the convergent and discriminant validity of processing and understanding dimensions were not advocated because some items were loaded in their associated factors less than in other constructs. The model as presented in Table 5 assumes that two dimensions of FSLSM were moderately well structured. However, the 4 factors model explained only 23.18 % of the total variance and some items were weakly loaded or exhibited a misfit.
In the next step, more investigation was conducted to explore the consistency of results of the integrated groups and compare them with other studies. Table 6 shows that there are no statistical significance differences between mean scores of the three groups (FPL, online course, and several courses), except for the visual pole where the p value equal 0.046. Furthermore, the most reoccurring styles for engineering students were active, sensing, visual, and sequential. As discussed by Felder and Silverman (1988), engineering students are more likely to be active, sensing, and visual. It was also stated that the most creative learners are global. The study corroborates literature as presented in Table 7. This can support the ILS because it shows similar styles for engineering learners irrespective of cultural differences and other characteristics of investigated populations. It could be noticed that even with a small sample such as in Franzoni and Assar (2008), the dominant preferences of engineering students are similar. Additionally, convergent validity is established if composite reliability (CR) exceeds 0.7. Table 8 depicts that input and perception dimensions achieved the acceptable score to support convergent validity, whereas other dimensions did not.
With regard to the "discriminant validity", Fornell and Larcker (1981) stated that this validity can be established if the variance shared between a factor and any other variables is less than the variance that a factor shares with its own variables. As shown in Table 8, the discriminant validity of perception and input constructs was supported as well. Furthermore, Platsidou and Metallidou (2009) carried out MANOVA tests to examine the learning styles differences of students from four disciplines (in-service teachers, education students, psychology students and polytechnic students). The effect of discipline was significantly shown in active/reflective and visual/verbal dimensions where the p values were 0.039 and 0.003 for both dimensions respectively. Based on the analysis, they concluded that the discriminant validity of the instrument can get some support.  80% 76% 54% * The percentages were not explicitly provided in these studies; however, authors indicated that the most reoccurring styles were active, sensing, visual, and sequential. Based on the overall analysis, on the one hand, the first hypothesis was rejected because results of reliability and validity showed that the internal consistency of two dimensions (processing and understanding) were less than the minimum acceptable level of alpha for attitude tests and could not suitability be loaded in factor analysis. Though the correlation between all dimensions was trivial or mild, an overlap was found, specifically, between perception and understanding dichotomies. On the other hand, reliability and validity of other dimensions (perception and input) were moderately supported to indicate that it is to some extent suitable to infer learning styles according to this model. However, a revision to unsupported dimensions can enhance their internal consistency reliability and construct validity. To sum up, the most reliable and valid dimensions in the instrument are perception and input because both met the acceptable level of reliability and validity. In prior studies, these dimensions also achieved better results than others to confirm our conclusion. Thus, the second hypothesis was retained.
The contribution of this study is twofold. First, the ILS produces consistent results regardless of cross-cultural differences. As presented in the comparison of engineering students' styles, all of them have particular preferences, more specifically, active, sensing, visual, and sequential. Second, it contributes to fill the gap about the appropriateness of the instrument to infer learning styles of Arabic engineering learners to show that it can be adopted in order to achieve this goal by considering the consistency of the produced results of learning styles of engineering students regardless of the cross-cultural differences.
Some limitations have to be highlighted. First, the sample was homogenous. Although it was sufficient to represent the population, it shared very similar characteristics. Second, larger population and test-retest examination can provide more reliable results that can be generalised. Hence, further research is recommended to overcome the limitation of this study.

Conclusion
Learning styles theory has been integrated in different learning modes in order to improve the experience of learners in terms of satisfaction and performance. However, it is widely criticised. One issue which has led to such debates is the reliability and validity of the proposed psychometric instruments to infer learning styles. Accordingly, this research was motivated, specifically, because of the scarcity of studies to investigate the appropriateness of the ILS in an Arabic engineering population. Another reason is the contradictory findings of related research in other populations. Overall results stated that the ILS produces similar results regardless of cultural differences of engineering samples to be the core contribution of the study as well as supporting the existing evidence with regard to the psychometric properties of the instrument. The investigation of reliability included internal consistency, inter-scale correlation, and total item correlation. The internal consistency of two dimensions was moderately accepted at value greater than 0.50. Furthermore, a weak overlap was revealed between dichotomies of the FSLSM. However, the strongest correlation was found between sensing/intuitive and sequential/global dimensions to contribute literature. Factor analysis, convergent, and discriminant validities were identical with the result of internal consistency to show that perception and input dimensions were moderately well defined. Although other dimensions did not show an acceptable level of reliability and validity, the consistency of findings with literature can moderately support the use of ILS.
The ILS has been dominant in recent years to adapt learning environments in accordance with learning styles. However, the concern regarding the soundness of the proposed psychological instruments to diagnose learning styles could not be refuted even with such a common instrument. Two dimensions of the instrument were supported. By considering the result of the study and comparing it with the reported literature, we can moderately support the overall properties of the ILS. In addition, participants were satisfied regarding the clarity of items and the ease of completing the questionnaire. Further effort on the instrument can assist to enhance its psychometric properties. In the future, a heterogeneous sample and data from different disciplines and universities will be integrated to get more robust results.