Comparison of Cognitive Diagnosis Models under Changing Conditions : DINA , RDINA , HODINA and HORDINA

The application of CDMs to fraction subtraction data revealed problems on the classification of examinees, latent class sizes, and the use of higher-order models. Additionally, selecting the most appropriate model assumes critical importance if there are several appropriate models available for the data. In the present study, DINA–RDINA and HODINA–HORDINA models were compared under changing conditions (i.e., number of attributes, g and s item parameter values, and number of items) with simulated and real data. The results show that for conditions where the g–s parameter values and the number of attributes were low (0.1 and 3, respectively), the reparameterized models generated values that were virtually identical to those obtained using DINA models. However, when the g–s parameter values and the number of attributes were increased (0.5 and 5, respectively), the parameter estimations obtained from the models, latent class estimates, AIC, and BIC show differences through the values from the models.

Similar with most CDMs, the DINA model also requires a Q-matrix (Tatsuoka, 1983) configuration with j×k (j.column and k. row) dichotomous values.q jk specifies whether the k attribute is required for the correct answer of item j.In the DINA model the latent response equation η ij , which is exactly specified by α i , determines whether examinee i possesses the attributes required for the item j.

∏
(1) If an examinee possesses all the required attributes for item j, η ij = 1; and if at least one of the attributes required for the item is missing, η ij = 0 (de la Torre, 2009; de la Torre, Hong, & Deng, 2010; de la Torre & Lee, 2010).The parameters required for correct response to item j are expressed by g j (guess) and s j (slip) parameters.
The g j parameter is the probability that an examinee who does not possess all the required attributes for item j will respond correctly to the item.The s j parameter is the probability that an examinee who possesses all the required attributes for item j will answer wrongly (de la Torre & Douglas, 2004, 2008;Huebner & Wang, 2011).The parameters s j and g j are defined as follows.
The item response function for the item j can be written as The joint likelihood function of the DINA model can be expressed as follows.
, ; 1 1 de la Torre and Douglas (2004), by adding an IRT model for joint distribution of attributes,

| |
have obtained a HODINA model that assumes the dependence of cognitive attributes on one or more latent traits.
Examinees with higher-order θ at this point are more likely to possess latent attributes in comparison the examinees with lower level θ (DeCarlo, 2011; de la Torre & Douglas, 2004).Thus, the HODINA model can be used to classify examinees over specific attributes and estimate their latent trait (Li, 2008).

The RDINA and HORDINA Models
DeCarlo (2011) obtained the following RDINA model by reparameterized the DINA model as a latent class logistic regression model.
In this equation, the items are used to determine the attribute sets.The f j parameter gives the logodds of the false alarm, which is the probability of correctly responding to examine j that does not have the required attributes.In addition, DeCarlo ( 2011) obtained the HORDINA model by including the latent continuous variable θ into the model for situations where the probability an examinee to possess the attributes is determined by the examinee's latent trait θ.
Despite their potential benefits, there are some limitations to the practice of CDMs in education.These limitations include the complexities of the CDM and the choice of the wrong model (de la Torre & Douglas, 2004;de la Torre et al., 2010).Additionally, the application of CDM to fraction subtraction data of Tatsuoka (1990) reveals some problems on the classification of examinees, latent class estimations, and using higher-order models (DeCarlo, 2011).Majority of these problems involve misclassification of Q-matrix specifications and other specifications of the model, and examinees who get all of the items incorrect are classified as possessing most of the skills by some CDMs.Although the use of higher-order models has been shown to provide a limited ameliorated classification problem, it cannot be precisely specified that this may be due to the misspecification of the Q matrix (DeCarlo, 2011 For this purpose, in the scope of the research, DINA-RDINA and HODINA-HORDINA models were compared using simulated and real data under changing conditions, such as the number of attributes, g and s parameter values, and the number of items to provide more reliable and valid inferences to evaluate the parameters and help to select the most appropriate model.The comparison of the models is aimed at providing a better understanding of the extent to which some of the theoretical features of the model are realized in practice, what the missing aspects are, and how the parameters are affected by the characteristic changes.Although DINA and RDINA models have the same likelihood functions, it is desired to investigate whether models can provide the same estimates in practice.

Simulation Study
The simulation study, in the case of changes to the number of attributes (3, 4, 5), g-s parameter values (0.1-0.5), and item numbers (20, 30), aims to investigate whether the g and s parameter values, the AIC and BIC, and latent class estimates obtained from DINA-RDINA and HODINA-HORDINA models differ or not.For this purpose, 5 different values were determined for each of the g and s parameter values from 0.1 to 0.5, with 0.1 increment.Then, 25 different g-s combinations meeting these values were obtained.Thus, 3 × 25 × 2 = 150 conditions (3 attribute number ×25 g-s values ×2 item sets) were tested.The sample size was set to 2000 to obtain accurate parameter estimates from the models (de la Torre et al., 2010).Simulation data were obtained with Ox (Doornik, 2002) based on DINA model and it was assumed that the attributes were independent.Additionally, DINA and HODINA models' g-s parameter and latent class estimates, AIC and BIC information criteria were obtained with Ox.LatentGold (Vermunt & Magidson, 2005) was used for RDINA and HORDINA models analysis.However, latent class estimates are not obtained directly at the output of the LatentGold.Latent class estimates have been obtained using a macro written in Excel, which uses a posterior classification matrix obtained from the LatentGold output.If the posterior mean of an examinee (α ik ) is equal to or greater than 0.5, it assumed that examinee i possesses k attribute; if α ik is lower than 0.5, than it is assumed that examinee i has not possessed the attribute k (DeCarlo 2011; de la Torre & Douglas, 2004;de la Torre et al., 2010).Thus, the percentage of examinee classification assigned to latent classes by the models for the same data set was compared.To evaluate the model-data fit, the two most frequently used information criteria in the statistical literature (Hagenaars & McCutcheon, 2002), AIC proposed by Akaike (1973) and BIC proposed by Schwarz (1978), were preferred.DeCarlo (2011) stated that the RDINA model provides virtually identical values to the item parameter values obtained using the DINA model (de la Torre & Douglas, 2004).Therefore, it is expected that the item parameter (g, s) estimation differences obtained from DINA-RDINA and HODINA-HORDINA models will be zero.Therefore, the differences of the parameter estimates obtained from the models are shown graphically.

Real Data
The real data set are obtained from the English grammar test applied to 565 examinees at Ege University, School of ies.ccsenet.
Foreign La items that

Item
Findings a and s are c represente in Figure 1 The g para sum of the between th (g4s5, g5

AIC and BIC
The AIC results show that the RDINA (15,343) and HODINA models (15,220) provide better fit values compared with the DINA (15,631) and HORDINA models (15,251), respectively.The BIC results also show that the RDINA (15,573) and HODINA models (15,454) provide better fit values compared with the DINA (15,973) and the HORDINA models (15,502), respectively.

g and s Parameter Estimates
While the joint distribution of skills is based on a multinomial distribution in the DINA model, the HODINA model is based on a higher-order latent proficiency.In all analyses performed using the simulated data, the item parameter estimates obtained from the models provided virtually identical values when the number of attributes and the parameter values of g-s were low.The findings obtained with respect to the low g-s parameter values were consistent with the findings of de la Torre et al. ( 2010) and Huebner and Wang (2011). However, in conditions (g4s5, g5s4, g5s5) where the number of attributes and item parameter values (particularly sum of g and s parameters is equal or higher 0.8) were increased, g and s parameter estimates obtained from the models showed differences.Furthermore, in real data analysis, the differences between the item parameter estimates obtained from the DINA-RDINA and HODINA-HORDINA models were also remarkably high.Similarly, de la Torre and Lee (2010) stated that although the invariant property of the DINA model parameters is provided in the simulated data, there were inconsistencies in the item parameter estimates in the real data; therefore, this condition was not fully provided.However, this should not be seen as a reason to downplay the practical usefulness of the DINA model (de la Torre & Lee, 2010).

Latent Class Estimates
The attribute numbers in simulation data are 3, 4, and 5, and 5 in the real data set.In the simulated data, 50 data sets are analyzed for each attribute.The results show that the latent class estimates obtained from the DINA-RDINA and HODINA-HORDINA models are virtually identical when the sum of the g-s parameter values is equal 0.5 and lower, while significant differences are found when they were 0.8 and higher.There are also significant differences in latent class estimates obtained from the real data set.Similar classification problems are also expressed by DeCarlo (2011).In addition, de la Torre et al. ( 2010) and Huebner and Wang (2011) are also stated that the accuracy of the classification has increased with low level g and s parameters.Furthermore, many latent classes are estimated to be zero by the reparameterized models when the number of attributes and g-s parameter values increased.Consequently, for all conditions in the simulated data, the DINA models provided more consistent latent class estimates in comparison to the reparameterized models.In addition, many latent classes are estimated to be zero by the reparameterized models in the analysis results carried out with the real data set.

AIC and BIC
AIC and BIC fit statistics were used to evaluate the model data fit.de la Torre and Douglas ( 2004) stated that the choice of appropriate model becomes a critical issue when the number of attributes increased.While the number of attributes in the simulation studies is 3, the DINA model and the reparameterized models provide fairly close AIC and BIC, but the values are differentiated when the number of attributes is increased (despite very low g and s parameter values).
When all the given conditions are considered for simulated data, the AIC results show that the RDINA model provided better fit values compared with the DINA model in 147 out of 150 conditions.The BIC results show that the RDINA model in all 150 conditions provided better fit values compared with the DINA model.The AIC and BIC in the real data analysis show that the RDINA model also provided better fit values compared with the DINA model.The results of AIC in higher-order models show that the HODINA model provided better fit values in 125 out of 150 conditions than the HORDINA model.The BIC results show that the HODINA model provided better fit values in 149 of the 150 conditions than the HORDINA model.The results of AIC and BIC in the real data analysis show that the HODINA model provided better fit values than the HORDINA model.Furthermore, when the models are handled according to their structure, the AIC and BIC values for all simulated conditions of the HODINA model provided better fit values compared with the DINA model.de la Torre and Douglas (2004) also stated that the higher-order model provides better fit values than the basic model.In reparameterized models, the results of BIC show that the RDINA model provided better fit values than the HORDINA model in all conditions.The results of AIC show that in 144 of the 150 conditions, the RDINA model provided better fit values than the HORDINA model.Similar results have been reported by DeCarlo (2011).When all conditions of the research are considered, the BIC provides more consistent results compared with the AIC.Although a large number of studies reporting the superiority of one another in both criteria are found in literature, the findings of the present research are consistent with those results reported by Jedidi et al. (1997), Kuha (2004), Li et al. (2009), McQuarrie and Tsai (1998), Nylund et al. (2007), Tofighi and Enders (2007), Yang (2006).
The results of AIC and BIC determined the most parsimonious model.Nevertheless, the decisions to be made on the Q-matrix should not be given based only on the information criteria, but the fit statistics, validity studies, and other evidence are also needed (DeCarlo, 2011).The higher g and s parameters obtained from CDMs can be considered as empirical evidence of the misspecification of the Q matrix (Rupp & Templin, 2008).In addition, the Q matrix structure has a potential to influence item parameter estimates and misclassification (de la Torre et al., 2010).de la Torre and Douglas ( 2004) stated that the appropriate model choice provides greater consistency in the correct classification rates of attributes and thus provides better estimates (e.g., higher correlation and lower RMSE).Consequently, attention should be given to the selection of the appropriate model and the correct determination of the Q matrix in the item parameter and latent class estimates obtained from the CDMs.
The findings of the present research revealed that the model-data fit is highly related with the item and latent class estimates.In addition, the decrease of the model-data fit leads to the differentiation of all estimates independent of the preferred mode.This emphasizes the importance of a priori analysis (design of Q matrix, correct specification of item and attribute relation) for CDM studies (de la Torre et al., 2010;Huebner & Wang, 2011;DeCarlo, 2011DeCarlo, , 2012)).The most important variables of model-data fit are Q matrix validity and item quality.In this case, the comparison studies for the models should be done considering the Q matrix compatibility.The work to be carried out for the situations provided by these conditions will give more realistic results for model comparisons.
However, it is difficult to achieve a perfect match between the Q matrix properties and the latent class structure (de la Torre et al., 2010).

Item and Attribute Numbers
The number of items used in simulation data studies is 20 and 30.There are 75 different conditions for each item set.The increase in the number of items leads to an increase in the AIC and BIC values.This result is expected because of the increased number of estimated parameters.The increase in the number of items in terms of AIC causes a difference of 3 in 75 cases where the DINA-RINA models are compared and 4 cases in the HODINA-HORDINA model comparisons.The increase in the number of items in terms of BIC does not lead to any difference in the 75 conditions compared with the values obtained from the DINA-RINA models, whereas in the HODINA-HORDINA model comparisons, the result is different only in one condition.Consequently, there was no evidence of a significant effect of the increase in the number of items covered in the study on the conditions where the models were compared to the AIC and BIC.
Similarly, when the number of items is fixed (e.g., 20 and 30), the AIC and BIC show no significant finding of the effect of the increase in attribute numbers (3, 4, 5) on the comparison of models.However, the increase in the number of attributes has led to an increase in the g and s parameter differences (DINA-RDINA, HODINA-HORDINA) obtained from the models and differentiation in latent class estimates.Similar results have been reported by Chiu (2008) and de la Torre et al. (2010).

Conclusion
Within the scope of the research, the DINA-RDINA and HODINA-HORDINA models were compared under varying conditions, such as the number of attributes in the Q matrix, g and s parameter values, and the number of items.In all analyses performed with the simulated data, the conditions where the number of attributes and g-s parameter values is low, the item parameter and latent class estimates obtained from the models are virtually identical.However, the g-s parameter and latent class estimates obtained from the models are differentiated by increasing the number of attributes and item parameter values.In addition, many latent classes are estimated to be zero by the reparameterized model under conditions of increasing attribute number and item parameter values.In all conditions of the simulated data and the real data set, the DINA models provided more consistent latent class estimation values compared with the reparameterized models.The AIC and BIC obtained from the simulated and the real data show that the RDINA and HODINA models provided better fit values compared with the DINA and HORDINA models, respectively.When all conditions in the research are considered, the BIC provides more consistent results compared with the AIC.In cases where the models are compared based on the AIC and BIC, no significant effect is observed of the increase in item and attribute numbers.However, the increase in the number of attributes leads to an increase in the g and s parameter differences and differentiation in latent class sizes.
Consequently, for the lower number of attributes and g-s parameter values for the simulated data in all conditions, the reparameterized models (RDINA, HORDINA) proposed by DeCarlo (2011), and DINA models (DINA, HODINA) yielded virtually identical g and s parameter estimates, latent class estimates, AIC and BIC; whereas, with the increase in the number of attributes and item parameter values, all parameters are differentiated.In the real data set, all values obtained are different.
Figure 2 sh and s param is equal 0. parameter identical s of g and s 0.0226 to both mode estimates; 3.1.2Laten Latent clas Figure 3 s pointwise the g-s par when the HODINAshown her the latent c 3.1.3AIC For the 25 (number o model.In model in 2 3, number HORDINA DINA mod when the n Figure 4 s for items 1 items.The differences respectivel items.The s paramete Figure 5 sh item 14, 0. other items is 0.12.Th and 0.025 ).
Torre & Douglas, 2004;de la Torre, 2009)e mode (maximum), whereas the HODINA model estimates are based on the mean (expected value).While the HODINA model uses Markov Chain Monte Carlo (MCMC) algorithm for item parameter estimates, the DINA model uses the Expectation-Maximization (EM) algorithm.Parameter estimates for MCMC and standard errors are obtained by calculating posterior means and standard deviations.Although it is expected that algorithms may give different results due to these important differences, DINA and HODINA models provide considerably similar estimates, which indicate that EM and MCMC algorithms can be used to obtain accurate parameter estimates (deTorre & Douglas, 2004;de la Torre, 2009).