Approaches for Structural Investigations of Binary Data Using Confirmatory Factor Models

An investigation of the suitability of threshold-based and threshold-free approaches for structural investigations of binary data is reported. Both approaches implicitly establish a relationship between binary data following the binomial distribution on one hand and continuous random variables assuming a normal distribution on the other hand. In two simulation studies we investigated: whether the fit results confirm the establishment of such a relationship, whether the differences between correct and incorrect models are retained and to what degree the sample size influences the results. Both approaches proved to establish the relationship. Using the threshold-free approach it was achieved by customary ML estimation whereas robust ML estimation was necessary in the threshold-based approach. Discrimination between correct and incorrect models was observed for both approaches. Larger CFI differences were found for the threshold-free approach than for the threshold-based approach. Dependency on sample size characterized the threshold-based approach but not the threshold-free approach. The threshold-based approach tended to perform better in large sample sizes, while the threshold-free approach performed better in smaller sample sizes.


Introduction
The paper presents an evaluation of two approaches that are recommended for the investigation of the psychometric quality of psychological scales with items yielding binary data; i.e. items that only allow for correct or incorrect responses. The two approaches comprise methods for overcoming the following problem: on one hand there are binary data achieved by the items of psychometric scales and on the other hand there are statistical procedures that have been designed for investigating continuous and normally distributed data; investigating the binary data by means of these procedures means a violation of assumptions. An established way of overcoming this problem is the use of a link transformation as part of a generalized linear model (McCullagh & Nelder, 1985;Nelder & Wedderburn, 1972). However, this way requires the use of a link function from the so-called exponential family that is not suitable for the statistical procedures normally employed for the investigation of the psychometric quality of psychological scales: confirmatory factor analysis (CFA). The two approaches in the focus of this study propose ways of overcoming the problem in the framework of CFA when the data are binary.
The two approaches are the threshold-based and threshold-free approaches. The threshold is defined as the point that separates the two response options on the underlying ability or trait dimension. A major characteristic of the threshold-based approach is the estimation of thresholds. In this case either tetrachoric correlations are computed and used as input to CFA (Muthé n, 1984) or item factor analysis (Bock, Gibbons, & Muraki, 1988) is conducted including the estimation of thresholds. In contrast, the threshold-free approach uses a link transformation that is conducted as part of CFA  in order to overcome the difference between data and statistical model. The evaluation of these approaches is conducted by means of two simulation studies. These studies are designed to provide empirical evidence with respect to the following three research questions: (1) Do both approaches perform the transformation from binary that also means binomially distributed to continuous and normally distributed completely or is an additional adjustment necessary? (2) How does the sample size influence the outcome? (3) Are these approaches appropriate for distinguishing between correct and incorrect models? in order to assure that the latent variable represents a specific source of responding. In the past many models of measurement with fixed parameters have been considered but only those allowing errors to vary found acceptance (Millsap, 2001;Millsap & Everson, 1991). Such models are the tau-equivalent model, the early growth curve model and the fixed-links model. Constrained factor loadings are not a disadvantage since models with free and fixed factor loadings lead to the same fit results if the model is correct (Schweizer, Ren, Wang, & Zeller, 2015;Schweizer, Troche, & Reiß, 2018). In contrast, free factor loadings perform better than constrained factor loadings in the case of an incorrect model because of their better adaptability to minor additional effects. Furthermore, factor loadings are considered as analogous to discriminability parameters of models in the framework of the item-response theory (IRT) (Lucke, 2005), where the fixation of parameters is quite common. Examples are the Rasch (1960) model, the corresponding one-parameter model (Birnbaum, 1968), and the Rasch model-based linear logistic test model (Scheiblechner, 1972).

The Characteristics of the Threshold-Based Approach
The concept of threshold values for overcoming the difference between distributions was developed for estimating the tetrachoric correlation (Pearson, 1900). While different methods are available for computing this correlation, the most popular method includes the estimation of latent thresholds by means of the maximum likelihood estimation method (Tallis 1962). To illustrate the transformation, assume a binary, random variable, X, with zero and one as possible values that represents the response to an item. Then the probability of X being equal to one [i.e., Pr(X=1) ] is set equal to the probability that the continuous and normally distributed variable U is larger than the threshold  that characterizes the item: Furthermore, it is assumed that the observable variable X that is binary and shows a binomial distribution is the result of dichotomizing the non-observable variable X* that is continuous and normally distributed (Muthé n, 1984). This means that there is the function f Dichotomization_using_ such that After having achieved the thresholds for two random variables X i and X j (i,j=1, …, p), these thresholds are used for computing the correlation between X i and X j . Because of the switch from binomial to normal, the tetrachoric correlation is treated as correlation between two continuous, normally distributed random variables (i.e., latent correlation).
The tetrachoric correlations are integrated into the pp matrix of correlations among all variables, R tetrachoric . The transformation preceding the computation of the correlations is expected to assure that this matrix represents relations among underlying variables at the latent level, thus satisfying the requirements of CFA (Muthé n, 1984(Muthé n, , 1993. Although this approach presents a convincing theoretical basis, the results obtained in empirical research are not always in line with the expectations. Thus, there have been many recent attempts to the improvement of the efficiency of computing tetrachoric correlations (e.g., Forero, Maydeu-Olivares, & Gallardo-Pujol, 2009;Maydeu-Olivares, 2006;Savalei, Bonett, & Bentler, 2013;Savalei & Rhemtulla, 2013).

The Characteristics of the Threshold-Free Approach
Within the threshold-free approach, the difference between the data and the model of measurement is overcome in three steps; however, only two of them are of importance for structural investigations concentrating on model fit (Schweizer, 2013;. These are the step from the binary to the continuous scale and the step from a binomial to a normal distribution. The third step (not of importance in model fit) is the disattenuation of factor loadings.
The step from the binary to the continuous scale is accomplished by computing probability-based covariances. The covariance of two binary random variables X i and X j (i,j=1, …, p) with zero and one as values representing incorrect and correct responses cov(X i , X j ) is defined as where Pr(X i =1) and Pr(X j =1) are the probabilities that X i respectively X j are equal to one (i.e., a correct response) and Pr(X i =1  X i =1) the probability that both X i and X j are equal to one. It does not require knowledge of the individual entries of a data matrix and can be computed on the basis of frequencies, that is, on the basis of binary information; however, outcomes are values following an interval scale. The probability-based covariance defined this way corresponds to a pre-stage of computing the Phi coefficient (see McDonald & Ahlawat, 1974;Schweizer, 2013).
The step from the binomial distribution to a normal distribution is conducted in concentrating on the variances and covariances, which ultimately serve as input to CFA. This means that the step of estimating and using thresholds when computing tetrachoric correlations is simply omitted. The variance of the continuous and normally distributed random variable X i of the model var(X i ) [=cov(X i , X i )] is transformed by the link function g link to correspond for the observed variance s i : The link function of the threshold-free approach is realized as weight w i (i =1, …, p). In CFA using fixed factor loadings w i is defined as: where Pr(X i = 1) is the probability of a correct response. This weight serves as a multiplier to the corresponding factor loadings when a covariance model is used (Jöreskog, 1970): Here,  i is the fixed factor loading,  the variance of the latent variable and i  the error variance. As is shown in Equation 6, the link transformation extends only to the systematic variance but not to the error variance.

On the Evaluation of the Performance of Models With Constraints
Two strategies are employed for evaluating the outcomes of investigations of the structure of binary data using methods according to both two approaches. The first strategy focuses on model fit. The model of measurement plays a major role in this strategy. All models of measurement (see Graham, 2006) show the same basic structure that is adapted from the congeneric model (Jöreskog, 1971). The congeneric model is defined as where y is the p1 vector of centered observations, η the q1 vector of latent variables (=latent factors) and ε the p1 vector of error components.
Constraining factor loadings means the assignment of numbers to the elements of . Each column  j (j=1, …, q) of refers to a specific latent variable. In a simulation study, the numbers included in  j can be expected to serve well in CFA if they correspond to those used in constructing the relational pattern for generating simulated data because CFA implicitly accomplishes the reproduction of the covariance or correlation matrix, as noted by: where  is the pp model-implied matrix of variances and covariances,  is the pq matrix of factor loadings,  the qq matrix of the variances and covariances of the latent variables and Θ the pp diagonal matrix of error variances (Jöreskog, 1970). If there is only one latent variable this model reduces to Equation 9 can also be perceived as a scheme that is implicitly used in the preparation of the relational pattern for data generation. Therefore, it is possible to create the situation for study where  data_generation corresponds to  data_analysis : If such a situation is given in a simulation study, CFA can be expected to lead to a good model fit although, of course, all the other important essentials, as for example a sufficient sample size, must be given.
The second strategy focuses on the comparison of the fit results obtained in investigating two different types of data: data requiring the steps from binary and binomial to continuous and normal, on one hand, and data only requiring the step from binary to continuous, on the other hand. Binary data obtained by simulating items showing a broad range of difficulties (i.e., means a broad range of different degrees of skewness) require a change of scale and a link transformation. These data are referred to as biased data in this paper. In contrast, in binary data requiring the step of the second type only, the data already show a symmetric distribution so that a transformation for eliminating skewness is not necessary. Such data are referred to as unbiased data in this paper. In unbiased data the probability of a correct response is .5.
Both types of data are assumed to originate from continuous data following the standard normal distribution [N(0,1)]. The pp matrix C continuous_and_normal (computed from these data) is assumed to include the variances and covariances obtained from the columns of the generated matrices. Unbiased binary data are achieved by rescaling continuous data into zeros and ones in such a way that the probability of one is .5 and the variance is .25. The variances and covariances of the unbiased binary data are included in the pp matrix C binary_and_symmetric . Since C continuous_and_normal and C binary_and_symmetric originate from the same raw data [N(0,1)] and the transformation from continuous to binary affects the variances of all columns of the data matrix in exactly the same way, there is a special relationship. This relationship refers to the expected matrices and can be described by the linear function g linear (that is not a link function): while individual pairs of matrices may show a small degree of deviation because of random influences. As a consequence, the virtually same degree of model fit can be expected for E(C binary_and_symmetric ) and for E(C continuous_and_normal ) if the model is specified correctly.
In contrast, biased data originate from splitting up continuous and normally distributed data according to a variety of different proportions so that the variances var(X i ) (i=1, …, p) vary between 0 and 0.25. Therefore, in this case it is not possible to describe the relationship by the linear function g linear . Instead where C binary_and_asymmetric is a pp covariance matrix of binary data showing a binomial (and mostly asymmetric) distribution. The inequity suggests that in investigations of model fit the investigation of C binary_and_asymmetric should lead to worse results than the investigation of C continuous_and_normal or C binary_and_symmetric if there is no efficient link transformation or correct threshold estimation. Consequently, it depends on the link transformation or threshold estimation whether the investigation of biased data (C binary_and_asymmetric ) leads to good model fit or model misfit.

The Objective of the Empirical Investigation
The main aim of the empirical part of the research was to investigate whether the threshold-based and the threshold-free approaches assured that the steps from binary and binomial to continuous and normal were accomplished correctly. Using confirmatory factor models that corresponded to the models used for data generation, it was expected that the successful accomplishment of the steps would find its expression in good model fit.
Another aim was to investigate whether the two approaches were appropriate for distinguishing between correct and incorrect models. This aim is important to determine whether the transformations retained the discriminability between correct and incorrect models.
A third aim was the investigation of the influence of the sample size on model fit. The sample size was an important topic in factor analytic investigations with a long tradition since a sufficiently large sample was found to be an important precondition for the achievement of valid results.

First Study
Two 1212 relational patterns of coefficients served the generation of simulated data. Since there were already studies using patterns obtained by means of equally sized factor loadings (e.g., Schweizer, Ren, Wang, & Zeller, 2015), more complex patterns were employed that still enabled a one-dimensional representation and could be expected to reveal differences between the threshold-based and the threshold-free approaches. In the first relational pattern addressed as Pattern A the coefficients showed a systematic increase from 0.15 to 0.35, and the coefficients of the diagonal were 1.00. The upper half of Table 1 shows the lower triangle of this pattern.
The second relational pattern was assumed to originate from two subsets of manifest variables. The coefficients of this pattern varied between .15 and .35. The lower half of Table 1 includes this Pattern B. Two hundred 1000  12 matrices were generated according to each pattern. To obtain relationships according to these patterns the unrelated columns of matrices including random data were combined using weights achieved by means of a procedure proposed by Jöreskog and Sörbom (2001). The matrices of simulated data obtained this way provided the outset for the transformation into binary data.
In the next step, the numbers of the columns of the matrices were dichotomized. Two types of splits were realized for obtaining biased data and unbiased data. First, the continuous data were split so that a broad range of probabilities of binary events was achieved. The following set of splits was realized: 0.100, 0.172, 0.225, 0.317, 0.390, 0.462, 0.535, 0.608, 0.681, 0.754, 0.827, and 0.900. For example, the 0.100 split in the first column required that the 10 percent of small random numbers in this column of each matrix were replaced by zeros and the remaining numbers by ones. In the next column, zeros substituted the 17.2 percent smaller numbers and the other numbers were substituted by ones. The remaining columns were treated in an analogous way. Second, dichotomization for achieving distributional equivalence was established by the 0.5 split of each column of each matrix. Afterwards, tetrachoric correlations and probability-based covariances were computed. The tetrachoric correlations were obtained by PRELIS (Jöreskog & Sörbom, 1994). PASCAL programs served the dichotomization of the continuous data and the estimation of the probability-based covariances. Correct and incorrect confirmatory models were specified. Each model consisted of one latent variable and 12 manifest variables loading on the latent variable. In the incorrect model, factor loadings were constrained to equal sizes in combination with setting the variance of the latent variable free for estimation. Correct specification with respect to data simulated according to Pattern A was achieved by using the following set of numbers as factor loadings: 0.387, 0.41, 0.434, 0.454, 0.473, 0.492, 0.510, 0.527, 0.544, 0.560, 0.576, and 0.592. In the case of a model according to the threshold-free approach weights according to Equation 5 additionally served as multipliers to the factor loadings. Using these numbers as constraints and Pattern A as input to CFA yielded good model fit,  2 (65) = 42.4, RMSEA = .000, SRMR = .023, GFI = .99, CFI = 1.00, NNFI = 1.01.
For all investigations the software LISREL (Jöreskog & Sörbom, 2006) was used with ML (also addresses as customary ML) and MLR as estimation methods. ML was selected because it proved to discriminate well between models in binary data (Zeller, Krampen, Reiß, & Schweizer, 2017). The evaluation of the outcomes concentrated on model fit. The following fit indexes were considered in preparing the report of the results: chi-square, degree of freedom, normed chi-square, RMSEA, SRMR, CFI, TLI and GFI. Cut-offs provided by Hu and Bentler (1999) and Kline (2005) served the evaluation of the results (RMSEA ≤ .06, SRMR ≤ .08, CFI ≥ .95, TLI ≥ .95, GFI ≥ .95) (see also DiStefano, 2016). Furthermore, normed chi-squares below 2 (N=200), 3 (N=400) and 5 (N=1000 and larger) were regarded as indications of a good model-data fit. The CFI difference served the comparison of models. Following Cheung and Rensvold (2002) a CFI difference of .01 was considered as substantial.

Second Study
Two uniform, 99 pattern matrices were selected for the generation of simulated data. The off-diagonal coefficients of the first and second patterns were 0.25 and 0.30, respectively, and the diagonal coefficients 1.00 in both cases. Besides the two different uniform patterns four sample sizes were considered: 200, 400, 1000 and 2000. Sizes of 200 and 400 were selected because the sample sizes of empirical studies frequently range between 200 and 400. The other sample sizes were added because the use of tetrachoric correlations might necessitate larger sample sizes to obtain good model-data fit. Sets of 100 matrices were generated for each combination of a specific sample size and a specific size of off-diagonal coefficients of the relational pattern.
The structure according to the two uniform patterns was induced into the matrices of random data in the same way as in the first study. Furthermore, in dichotomizing the continuous data the following set of splits was used: 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80 and 0.90. The data were dichotomized analogously to the procedure described for Study 1. The confirmatory factor model for investigating the matrices comprised one latent variable and nine manifest variables. The factor loadings were constrained to equal sizes in combination with setting the variance parameter free for estimation. Since the variance of the latent variable could be expected to compensate for lower or larger sizes of the number for fixing the factor loadings, these loadings were simply set equal to one. The threshold-free approach additionally required weights as multipliers to the factor loadings.

First Study
The focus of this study was on model fit for the threshold-based and the threshold-free approaches, the difference in model fit between the applications to biased and unbiased data and the discrimination between correctly and incorrectly specified models.

Results for the Threshold-based Approach
The fit results observed when tetrachoric correlations based on Pattern A served as input to confirmatory factor analysis are reported in Table 2. Each row of this Table extends to means and corresponding standard deviations. The fit of the correct model with input originating from biased data using customary ML (first row) was good only with regard to SRMR and GFI. The comparison of the CFI results for input originating from biased and unbiased data (rows 1 and 3) signified a substantial CFI difference (.08). Robust estimation yielded fit results similar to the results for unbiased data (rows 3 and 5). Each comparison between the CFI results for the correct and incorrect models signified a substantial difference (rows 1 and 2: .02, rows 3 and 4: .01, and rows 5 and 6: .01). In sum, the threshold-based approach did not yield results to suggest that the step from binary and binomial to continuous and normal was performed adequately. Robust estimation was additionally necessary for achieving overall good model fit.
CFA with tetrachoric correlations based on Pattern B as input yielded the fit results of Table 3. Each row of this  The first row of Table 3 contains the fit statistics for the correct model and biased data using customary ML. Only the SRMR statistic indicated a good model fit. The comparison of the CFIs in investigating biased and unbiased data (rows 1 and 3) signified a substantial difference (.08). Conducting robust estimation led to an improvement even over customary ML applied to unbiased data (rows 3 and 5) according to the CFI difference (.01). All comparisons between the CFI results for the correct and incorrect models signified a substantial difference (rows 1 and 2: .02, rows 3 and 4: .01 and rows 5 and 6: .01). In sum, the results for the threshold-based approach without robust ML suggested that the step from binary and binomial to continuous and normal was not performed well. Robust estimation that accounts for distributional discrepancy was additionally necessary.  The model fit for the correct model applied to biased data using customary ML (first row) led to a good fit according to all fit indices. The comparison of the CFI results for the biased and unbiased data as input (rows 1 and 3) signified a substantial CFI difference (.02). Each comparison between the CFI results for the correct and incorrect models signified a substantial difference (rows 1 and 2: .03, and rows 3 and 4: .03). Thus, with the threshold-free approach the step from binary and binomially distributed data to continuous and normally distributed data was performed to some degree, as is obvious in the good model fit observed when investigating biased data. However, the comparison of these results with those observed for unbiased data as input signified that the step was not performed perfectly in this type of data.

Results for the Threshold-Free Approach
For Pattern B, CFA led to the fit results reported in Table 5. Again, each row of this Table includes both means and corresponding standard deviations. .98 0.00 The results for the correct model with biased data as input and using customary ML (first row) indicated a good model fit according to all fit indices. The CFI results for the biased and unbiased data (rows 1 and 3) as input closely corresponded. The correspondence indicated that the steps between binary and binomial, on one hand, and continuous and normal, on the other hand, were nearly perfect. Each comparison between CFI results for the correct and incorrect models signified a substantial difference (rows 1 and 2: .05, and rows 3 and 4: .03). In sum, with the threshold-free approach the steps from binary and binomially distributed data to continuous and normally distributed data were performed well. Furthermore, there was no more a difference between using biased or unbiased data as input to CFA.
Overall, the results suggested that the threshold-free approach accomplished the steps from binary and binomially distributed data to continuous and normally distributed data better than the threshold-based approach. As a consequence, the threshold-based approach needed to be combined with robust ML estimation whereas the threshold-free approach could be conducted using customary ML estimation. Both approaches enabled the discrimination between correct and incorrect models. The mean CFI difference between correct and incorrect models was 0.013 for the threshold-based approach and 0.035 for the threshold-free approach indicating that is was more likely to discriminate between the two types of models within the threshold-free approach compared to the threshold-based approach.

Second Study
This study investigated the effect of the sample size on model fit. To facilitate the reading of the tables reporting fit results, the superscript "M" was added to a mean of fit statistics if the outcome of the comparison of the mean with the corresponding cut-off was positive. If even the 95 % confidence interval was above respectively below the corresponding cut-off, it was replaced by the superscript "CI".
Means and standard deviations of the fit results of investigating the data matrices constructed according to the uniform pattern including off-diagonal coefficients of .25 are reported in Table 6. The results reported in the first to fourth rows were obtained by MLR estimation in combination with tetrachoric correlations as input to CFA. In the sample size of 200 only the mean normed  2 and RMSEA indicated good model fit. In the sample size of 400 the mean CFI and TLI were also good. All fit statistics indicated good model fit within the 95 % confidence interval when the sample size was 1000 or larger.
The last four rows of Table 6 include the results of the investigations using the threshold-free approach. In the sample sizes of 200 and 400 normed  2 , RMSEA, SRMR and GFI signified good model fit within the 95 % confidence interval while the mean CFI and TLI were also good. After the increase of the sample size to 1000 the CFI results were also good within the 95 % confidence interval and the further increase to 2000 led all fit statistics to comply with the 95 % confidence interval. The data constructed according to the uniform pattern including off-diagonal coefficients of 0.30 were investigated in the same way as the data constructed according to the other pattern. The fit results are reported in Table 7. .99 CI 0.00 CI The 95 % confidence interval is in the favourable area. M The mean is in the favourable area.
The outcomes of the evaluation of these results almost perfectly corresponded to what was reported as outcomes for Table  6. For sample sizes of 200 and 400, the investigations using the threshold-based approach led to model misfit according to the majority of fit indices whereas for sample sizes of 1000 and 2000 all fit statistics indicated a good model fit. In contrast, when the investigations were conducted according to the threshold-free approach, all fit statistics were at least good according to the mean with sample sizes of 200 and 400. The estimates improved when the sample size was increased. For the sample size of 2000, all fit statistics were included in the 95 % confidence interval.
All in all, for sample sizes of 200 and 400 the threshold-free approach led to better fit results than the threshold-based approach. In contrast, for the sample sizes of 1000 and 2000, there was a small numeric advantage for the threshold-based approach

Discussion
The investigation of binary data by CFA presupposes that a relationship is established between the properties characterizing the data on one hand and the properties characterizing the model integrated into the statistical method used in the investigation of the data on the other hand. The data are binary and binomially distributed whereas continuous and normally distributed variables characterize the model. The two considered approaches suggest different routes to the establishment of such a relationship. The results of the simulation studies indicate that both approaches accomplish the relationship successfully. However, the threshold-based approach requires robust maximum likelihood estimation that accomplishes an additional transformation regarding the distribution of data whereas the threshold-free approach does not make such an additional transformation necessary. Furthermore, the two approaches seem to differ according to their sensitivity for the sample size. The achievement of good model-data fit appears to depend on the sample size when using the threshold-based approach; a very large sample size seems to guarantee very good model-data fit whereas model misfit is likely in a small sample size. In contrast, the threshold-free approach appears to show almost no dependency on sample size.
Future progress in science using CFA requires efficient discrimination between correct and incorrect models. In the present study the model-data fit is in many cases good for the correctly and incorrectly specified models. Therefore, the discrimination between models becomes an important aspect of an investigation. Bentler's (1990) CFI is frequently used for this purpose since a CFI difference of .01 and larger means a substantial difference (Cheung & Rensvold, 2002). Using the threshold-based approach in four out of six comparisons the mean CFI difference was .01. Despite the just significant difference of means, many individual comparisons were not substantial. In contrast, using the threshold-free approach, all mean CFI differences were .03 or larger. Therefore, it seems to be more likely to discriminate between correct and incorrect models when using the threshold-free rather than the threshold-based approach.
The results demonstrate that using fixed factor loadings in CFA does not prevent the achievement of a good model-data fit. This observation runs counter to concerns based on the observation that models with free factor loadings, especially Jöreskog's (1971) congeneric model of measurement, dominate applied research. The presented results demonstrate that fixed factor loadings can substantially contribute to science and not just serve as provisions for amending estimation problems (Millsap, 2001). This important conclusion holds for both the threshold-based and the threshold-free approaches.
The use of models with fixed factor loadings may increase in future research on binary data if the trend to employ models considering more than one latent source of responding continues. As long as it was assumed that scales composed of binary items represent one systematic source of responding only, one-factor models with free factor loadings (Graham, 2006) could be expected to serve perfectly well for investigating the structure of binary data. However, this assumption is no more tenable. The multitrait-multimethod approach introduced by Campbell and Fiske (1959) has for more than 60 years produced evidence of method effects. Data usually show a considerable proportion of method variance besides trait variance. The consideration of sources of method variance as part of the model does not only control for impurity in measurement but also improves the replicability of results (Schweizer, Troche, & Rammsayer, 2011. For example, there are a number of studies demonstrating that completing a homogeneous set of achievement items stimulates the item-position effect (e.g., Birney, Beckmann, Beckmann, & Double, 2017;Debeer & Janssen, 2013;Debeer, Buchholz, Hartig, & Janssen, 2014;Embretson, 1991;Hartig & Buchholz, 2012;Kubinger, 2008;Lozano, 2015;Ren et al., 2014;Schweizer, Schreiner, & Gold, 2009;Troche et al., 2016;Verguts & De Boeck, 2000;. A special characteristic of this effect is that it leads to a systematic increase of the proportion of systematic variance from the first to last items (Knowles, 1988). Another example of a method effect impairing the quality of measurement is the effect originating from the interaction of a time limit in testing and processing speed when the time limit prevents some or all participants of a sample from completing all items properly (Lu & Sireci, 2007;Oshima, 1994). The contribution of processing speed becomes apparent as lack of enough processing speed. This can mean either an omission or a random response. Since virtually all scales are applied with a time limit in testing, most performance data include (lack of) processing speed as another systematic source of responding to a larger or lesser degree. It is a source that usually influences the responses to the last few items of a scale and can be captured by a confirmatory factor model that takes the distribution of processing speed into consideration (Schweizer & Ren, 2013;Schweizer, Troche, & Reiß, 2018).
Overall the results demonstrate that confirmatory factor models with fixed factor loadings can serve well in investigating binary data independently of which approach is selected. The sample size seems to be a characteristic that distinguishes between the approaches and, therefore, needs to be considered in the selection of the approach for an investigation. Finally, it needs to be added that the reported results may be characteristic for the relational patterns used in data generation, and it is an open question to what degree they can be generalized to other relational patterns.