Can Variances of Latent Variables be Scaled in Such a Way That They Correspond to Eigenvalues ?

The paper reports an investigation of whether sums of squared factor loadings obtained in confirmatory factor analysis correspond to eigenvalues of exploratory factor analysis. The sum of squared factor loadings reflects the variance of the corresponding latent variable if the variance parameter of the confirmatory factor model is set equal to one. Hence, the computation of the sum implies a specific type of scaling of the variance. While the investigation of the theoretical foundations suggested the expected correspondence between sums of squared factor loadings and eigenvalues, the necessity of procedural specifications in the application, as for example the estimation method, revealed external influences on the outcome. A simulation study was conducted that demonstrated the possibility of exact correspondence if the same estimation method was applied. However, in the majority of realized specifications the estimates showed similar sizes but no correspondence.


Introduction
In both confirmatory and exploratory factor analysis (CFA and EFA) the sum of squared factor loadings can be computed.In EFA, this sum is considered an estimate of the eigenvalue that reflects the amount of variability explained by the factor (Vogt, 2005).In the present essay, it is investigated whether the sum of squared factor loadings of CFA corresponds to the sum of squared factor loadings of EFA.We explore whether the sum of squared factor loadings can be used as a measure of the variance of the latent variable (=factor) in CFA in analogy to the eigenvalue in EFA also known as characteristic root and latent root (Marcus & Minc, 1988, p. 144).An investigation is reported in which simulated data were used constructed to represent different underlying structures.

The Role of the Variance in Confirmatory Factor Analysis and its Scaling
As a descriptive statistic, the variance characterizes distributions of random variables.It can be computed in different ways depending on the scale of the random variable.If the random variable is a latent variable (i.e. a factor), the variance additionally depends on the model from which the latent variable originates.The characteristics of this model and the method employed for estimating the parameters of the model influence the variance.The model of the covariance matrix (Jöreskog, 1970) includes observed variances, variances of the latent variables and error variances.The estimation method determines how the observed variance is subdivided into latent variance that is also addressed as true variance and error variance.
The variance of a latent variable reflects the impact of the source underlying the latent variable.A stronger impact leads to a larger variance.However, the variance of the latent variable has not played a major role in CFA although it is a kind of effect-size indicator similar to the effect size in experimentation (Cohen, 1988;Rosenthal, 1994;Rosenthal & Rubin, 2003).In experimentation, the focus was for a long time on significance testing.Depending on the sample size, however, very small effects can yield statistical significance even if they are trivial and negligible for practical reasons.As a consequence, nowadays the report of effect sizes is required besides the significance level in the presentation of experimental results.
variables are compared with each other, differences in the underlying scales have to be taken into account.For example, to evaluate the quality of a confirmatory factor model, the amount of variance explained by the latent variables is compared to the overall variance in the manifest variables (DiStefano, 2016).
The scaling of the variances of latent variables is an important topic for the interpretation of variances as being "small" or "large" or for the comparison of the variances of different latent variables.Textbooks list several methods for this purpose (e.g., Brown, 2006).The difference originates from the representation of true variances and covariances by means of different parameters.In the model of the covariance matrix the variances of the latent variables show multiplicative relationships with the factor loadings of the observed variables, as is obvious from the following equation: where the p  p model-implied covariance matrix Σ considers p manifest variables and is set equal to the sum of ΛΦΛ' and Θ (Jöreskog, 1970).The product is composed of the p  q matrix of factor loadings Λ (and its transpose Λ' ) and the q  q matrix of the variances and covariances of q latent variables Φ  Θ is the p  p diagonal matrix of error components.
The multiplicative relationship of Λ and Φ enables the increase or decrease of the numeric size of the variance of a latent variable by a constant as multiplicand and at the same time the decrease respectively increase of the corresponding factor loadings by the same number as divisor such that ΛΦΛ' is constant.
The importance of scaling latent variables has been emphasized in the realms of growth curve modeling and invariance analysis (McArdle, 1988;McArdle & Cattell, 1994).While growth curve modeling focuses on changes (or constancy) of latent variables over time, invariance analysis compares the variances of latent variables between two or more groups of individuals.Little, Slegers, and Card (2006) (also Little, 2013) distinguish the following methods to scale variances of latent variables in longitudinal studies: the marker-variable method, the reference-group method, and the effect-coding method.
With the marker-variable method, one factor loading is set equal to one while the other factor loadings and the variance of the latent variable are freely estimated.In this case, the scale of the estimated variance of the latent variable refers to the scale of the manifest variable associated with the fixed factor loading.Selecting other factor loadings for fixation leads to other estimates of the variance.In using the reference-group method, the variance of the latent variable is set equal to one.The advantage of this method is that there is only one variance but it is not possible to compare latent variables by means of their variances if all variances are set equal to one.However, if this variance is used as reference for other variances that are estimated, it can provide a useful yardstick.Finally, the effect-coding method requires the assignment of numbers to factor loadings such that the summation of the assigned numbers equals the number of manifest variables.After fixing the factor loadings, the variance parameter is estimated.As outlined by Little et al. (2006), the estimate of the latent variance corresponds to the average of the indicator variables' variances when the number one is assigned to each factor loading (p.63).In addition to these methods, Schweizer (2011) proposed a scaling method highlighting the above-mentioned constancy of the explained variance that is reflected by the product ' in the model of the covariance matrix.
Although research on the scaling of variances of latent variables has mainly concentrated on invariance studies and longitudinal studies, the range of possible applications of scaling is not restricted to these areas of research.Other models of measurement including several latent variables are multitrait-multimethod models (Byrne, 2016), bifactor models (Gavinez, 2016) and models for the identification of processing strategies (Schweizer, Altmeyer, Ren, & Schreiner, 2015).
A multitrait-multimethod model, for example, usually includes several trait and method factors.Comparisons of the variance explained by the trait factors and the variance explained by the method factors contribute to the evaluation of whether the individuals' responding is mainly influenced by the trait or rather by the observational method.
The present paper originates from the search for a measure that adequately reflects the impact of the underlying psychological source on the latent variable, which is representing this source.In this framework, we examine whether the sum of squared factor loadings yields values similar to eigenvalues of EFA.After the theoretical foundations are explored in more detail, simulated data are used as input to both CFA and EFA.It is investigated whether the sums of squared factor loadings from CFA and EFA correspond to each other.Using different underlying structures in data generation, the preconditions for such a correspondence are explored in more detail.

The Theoretical Foundation of Scaling
Scaling of latent variables is conducted on the assumption that besides the p  q matrix of factor loadings Λ and the q  q matrix of the variances and covariances of latent variables Φ there is at least one other pair of a p  q matrix of factor loadings * Λ and a q  q matrix of the variances and covariances of latent variables * Φ leading to the same product such that For example, the pair consisting of * Λ = c Λ (c > 0) and * Φ = 1/ c 2 Φ owns this property.
Since CFA is mostly conducted according to the congeneric model of measurement (Jöreskog, 1971) that assumed one latent source only, it is especially important to consider ΛΦΛ' in assuming only one factor ξ .This means that  reduces to the p  1 vector λ and Φ to ξ  and that Equation 2 can be simplified to that is a p  p matrix.
If this matrix is a correlation matrix, the trace of this matrix incorporates a mathematical expression that suggests a close relationship of the variance of the latent variable of CFA and the eigenvalue of EFA: since the sum of squared factor loadings is considered in the computation of the eigenvalue in EFA.According to Equation 2, it is possible to select c (>0) such that 1  ξ  .In this case, the right-hand part of Equation 4 is rewritten as the sum of squared factor loadings providing a direct link to EFA: The eigenvalue can be obtained in different ways (see Hammarling, 1970).One frequently used way of computing it in EFA is the summation of the squared factor loadings to achieve the sum ξ s : Furthermore, the model used in EFA does usually not include representations of the variances of the factors since the reproduction of the observed matrix is achieved by means of the factor loadings and error variances only.In the case of oblique factors a matrix including the correlations of the factors is also considered.The main diagonal of this matrix includes ones that may be considered as variances of the factors scaled according to the reference-group method.In the case of orthogonal factors there is usually no such matrix although the identity matrix that assumes no correlations of the factors can serve this purpose without any change.In both cases, the value of the analogue to ξ  is one and, therefore, can be omitted: The major difference between Equations 4 and 7 is the presence respectively absence of ξ  .This is no real difference, however, because ξ  is one.Given corresponding sizes of the estimated factor loadings in CFA and EFA, the above outlined computations should result in corresponding sums.
Since the eigenvalue is considered an established measure of the variation for which the factor accounts in EFA (Vogt, 2005) and since a similar statistic is achievable in CFA, the use of this statistic for representing the variance of the latent variable To sum it up, setting the variance parameter of the CFA model of the covariance matrix equal to one and treating the sum of squared factor loadings as outcome yields scaled variances which should be similar to the eigenvalue of EFA.
An alternative way leading to the same result starts with the traces of the matrices of Equation 3: The traces can be simplified by transformations according to Equation 4 such that Next, the left-hand and right-hand parts of this equation are modified separately.Each one is transformed according to Equation 2 in such a way that the value stays constant.The left-hand part of this Equation is modified by means of c > 0 . These modifications reduce Equation 10 to and can be interpreted as

A Simulation Study
Although the investigation of the theoretical foundations suggests that CFA in combination with scaling according to the reference-group method leads to variance estimates that correspond to the eigenvalues of EFA, it needs to be investigated whether corresponding estimates are actually achieved when applied to empirical data.Both CFA and EFA are very complex methods with a number of possible reasons for discrepant estimates although they are expected to correspond because of the above outlined theoretical grounds.Furthermore, it is possible that the observation of corresponding estimates is restricted to specific conditions.Therefore, a simulation study addressed the question whether correspondence was possible when specific conditions were realized.
In order to assure the possibility of obtaining corresponding estimates, differences between the methods had to be taken into consideration.First, different estimation methods might lead to different results.Since maximum likelihood estimation was still the preferred estimation method of CFA for data that were continuous and showed the normal distribution, this estimation method was also selected for EFA.Second, while correlations were almost exclusively used as input to EFA, CFA was frequently based on covariances but there were also a number of applications to correlations.Therefore, correlations served as input to both EFA and CFA, in the present study.Third, in EFA it was quite common to assume that the data were continuous and followed the normal distribution.In CFA the estimation method was selected according to the scale and distribution of the data or, in the case of a mismatch, additional transformations were used by means of link functions (Schweizer, 2013).In order to comply with the assumption of EFA, continuous and normally distributed random data with a specific underlying structure were generated and investigated.
The following design characterized the simulation study: 1.Either no underlying structure or an underlying structure characterized the data; 2. One or two sources establishing an underlying structure.Thus, the generated datasets consisted of random data only, of random data with a one-dimensional underlying structure and of random data with an underlying structure due to two sources.In each case, the number of datasets was 100.CFA output for each dataset provided the basis for the computation of the variances of the latent variables as sums of squared factor loadings.Estimates of the eigenvalues obtained from EFA output were used to conduct comparisons.

Data Generation
The data generation was accomplished according to a procedure proposed by Jöreskog and Sörbom (2001, p. 159).It started with the preparation of 8  8 relational patterns.The identity matrix was used as relational pattern for the generation of datasets consisting of random data only.The relational pattern for the generation of random data including a one-dimensional structure was achieved by assuming factor loadings of .4.The off-diagonal elements of the pattern were set to .16 and the diagonal elements to 1.0.The relational pattern of random data including an underlying structure due to two sources required the consideration of factor loadings for two factors.All factor loadings on the first factor were .4.The first and second factor loadings on the second factor were zero, as it can be found in the bifactor model.The other factor loadings increased from .1 to .5.These factor loadings led to off-diagonal elements of the relational pattern that varied between .16 and .37.The diagonal elements of this relational pattern were 1.0.
These patterns provided the basis for the estimation of weights according to the procedure by Jöreskog and Sörbom (2001).Thus, for each of the three relational patterns a characteristic set of weights was obtained.
The next step was the generation of continuous random data following the normal distribution: X ~ N(0,1).The sets of random data were arranged as matrices showing the following combination of numbers of rows and columns: 100 x 8.In the following step, the continuous random data were recombined by means of the weights computed for the three relational patterns in order to obtain the datasets for the simulation study.

Statistical Analysis
CFA and EFA were conducted separately.LISREL (Jöreskog & Sörbom, 2006) served the investigations by means of CFA.The considered estimation method was the maximum likelihood estimation.A one-factor confirmatory factor model served the investigation of the datasets consisting of random data only and of random data including a one-dimensional structure.This model was specified to include one latent variable and eight manifest variables.The variance of the latent variable was set equal to one, and all factor loadings were estimated.The data including an underlying structure due to two sources was investigated by a two-factor confirmatory factor model.The variances of the two latent variables were set equal to one.All factor loadings on the first latent variable and the third to eighth factor loadings on the second latent variable were estimated.The first and second factor loadings on the second latent variable were fixed to zero.The correlation between the two latent variables was set to zero.Correlation matrices served as input to CFA.
EFA was conducted by means of the statistical software package SPSS.The EFA procedure was specified to perform maximum likelihood estimation.For investigating datasets consisting of random data only and of random data including a one-dimensional structure the number of factors was fixed to one.In the case of datasets including an underlying structure due to two sources, the number of factors to be extracted was set to two, and the factors were not rotated.SPSS required the raw data as input but prepared a correlation matrix in an intermediary step before conducting EFA.
The statistics of interest were extracted from the outputs.In the case of CFA, the non-standardized factor loadings were read from the output and transformed into the sum of squared factor loadings.Finally, this sum was taken as the estimate of   ξ var .In the case of EFA, the output offers initial eigenvalues and sums of squared factor loadings from extraction (SSFE).Since it is SSFE that is obtained by maximum likelihood estimation, this statistic is reported and used for the comparisons.
To evaluate correspondence, means and standard deviations of the estimates of   ξ var and SSFE and the correlational relationship between estimates of   ξ var and SSFE in the individual datasets were computed.Exact correspondence and a correlation of 1.00 were expected.
Regarding similarity, the means were compared with each other.Since there was no established similarity criterion and variances were always positive, a percentage characterizing the smaller variance with respect to the larger variance that would distinguish similarity from dissimilarity had to be selected.Accordingly, the criterion for the evaluation of similarity was that the smaller estimate was larger than half of the larger estimate.

The Results
Table 1 provides the means and standard deviations of the sums of squared factor loadings obtained by means of CFA (first, second and third rows) and the MLE-based eigenvalues of EFA (SSFR) (fourth and fifth rows).For some matrices, CFA yielded an excessively high value either due to estimation problems or due to the combination of one large factor loading and very small other factor loadings.Therefore, means and standard deviations were computed a second time but without these obviously inappropriate results.These additional means and standard deviations based on 68 instead of 100 datasets are provided in the second row.The remaining pairs of CFA and EFA results consisted of pairs of exactly identical estimates (60 of 100) and pairs of nonidentical estimates (8 of 100).For the nonidentical pairs, it was not obvious whether the CFA result or the EFA result or both were not correct.Means and standard deviations for identical CFA and EFA results are reported in the third and fifth rows. 2 Mean and standard deviation after the elimination of inappropriate CFA and EFA solutions.
The first column reporting the results for the one-factor model applied to random data showed a large difference between the statistics of CFA and EFA when all results were taken into consideration.Sums of squared factor loadings of up to 102 were observed in CFA.The elimination of the pairs of numbers that included an inappropriate CFA result reduced the mean sum of squares from 16.034 to 0.378.Thus, when analyzing all 100 data sets, the results did not indicate correspondence between EFA and CFA for the one-factor model applied to random data.The comparison of the results for the individual datasets revealed that in 60 percent of the investigations the exactly same value was obtained by means of CFA and EFA.After eliminating all individual data sets with computational problems in CFA or EFA, identical variance estimates resulted from CFA and EFA (see third and fifth rows of Table 1).
The second column of Table 1 comprises the results of the investigations of the random data with a one-dimensional underlying structure.As can be taken from Table 1, the exactly same results were obtained by CFA and EFA.The underlying structure apparently prevented computational problems, which led to deviations between the results of the two methods when random data without an underlying structure were analyzed.
The third and fourth columns of Table 1 present the results for the two-factor model.Variance estimates from EFA and CFA differed from each other.Regarding the first factor, no apparently inappropriate results were observed.For the second factor, however, some of the CFA results were inappropriate (excessively high values either due to estimation problems or due to the combination of one large factor loading and very small other factor loadings).Even after eliminating the inappropriate results, the mean variance estimates from CFA and EFA did not correspond.Furthermore, the comparison of the results for the individual datasets did not even reveal one case of identity between the variance estimates from CFA and EFA.
Since there was no correspondence of the variance estimates from CFA and EFA when the two-factor model was used, similarity between the variance estimates according to the above explained similarity criterion was investigated.The mean for the first CFA factor was smaller than the mean for the first EFA factor and larger than half the mean of the first EFA factor.In contrast, before eliminating inappropriate results, the mean of the second CFA factor was more than nine times larger than the mean of the second EFA factor.However, after the elimination of inappropriate results, the mean variance estimate from EFA was larger than half the mean variance estimate from CFA as required by the criterion for similarity.
Furthermore, correlations between variance estimates from CFA and EFA in the 100 data sets were computed.The Pearson correlations for the three kinds of data sets are reported in Table 2.The correlation after the elimination of inappropriate CFA and EFA solutions.There was a perfect correlational relationship between the CFA and EFA results for the random data with an underlying one-dimensional structure.Furthermore, after the elimination of inappropriate results there was also a perfect correlation between the results on the random data without an underlying structure.When a two-factor model was applied to the two-dimensional structure, however, the correlations were only small or moderate.
In sum, correspondence was always observed if the one-factor model was applied to random data with a one-dimensional underlying structure.In random data without an underlying structure it was necessary to eliminate inappropriate results to achieve correspondence.If the data were generated to show a two-dimensional underlying structure, there was similarity but no correspondence.

Conclusions
The variance of the latent variable has the potential of an important statistic in CFA as it reflects the strength of the effect of a latent source on the observed manifest variables.It may be perceived in analogy to the effect size of experimentation that is expected to reflect the impact of the experimental manipulation (Cohen, 1988;Rosenthal, 1994).A precondition is that the scaling of variances yields values that reflect the actual sizes of the variances; another one is that variances of different latent variables can be compared with each other.
The search for a method for scaling the variances of latent variables in CFA stimulated the research work presented in this paper.It was investigated whether the squared factor loadings obtained in CFA would provide an estimate of the variance of the latent variable; this estimate should correspond to the sum of squared factor loading provided by EFA as the eigenvalue of the respective factor.
Although the investigation of the theoretical foundations suggests equivalence, the simulation study revealed a number of constraints.Especially the application of the maximum likelihood estimation method might have led to a deviation from the eigenvalue of the correlation matrix as characteristic root.The sum of squared factor loadings obtained by maximum likelihood estimation is usually a bit smaller than the original characteristic root of a correlation matrix.
However, if maximum likelihood estimation is conducted in both CFA and EFA and, in both cases, the sum of squared factor loadings is computed, correspondence is achievable.The results show that there is exact correspondence if the investigated data include one latent source of structure only.In contrast, in the absence of a latent source or the presence of several latent sources there is no equivalence.Since CFA and EFA are complex procedures, there may be procedural differences that do not result in different variance estimates as long as the data provide ideal conditions.However, these procedural differences may lead to differing results if the conditions are not ideal.
Although the results for the two-factor model are worse than the results for the one-factor model, this finding does not mean a disadvantage for the purpose of the comparisons of variances of different latent variables.Since the variances that have to be compared with each other normally characterize latent variables of the same model, for example, the two latent variables of a bifactor model or latent variables of a model representing specific and general processes (Schweizer, 2006), the procedure for the computation of the two variances is the same.Procedural differences are likely to influence comparisons only if results obtained by means different procedures are to be compared.
Overall, the sum of squared factor loadings as method for scaling the variances of latent variables in CFA can be expected to yield values that are similar to eigenvalues, and under special circumstances there may even be equivalence.

Table 1 .
Means and Standard Deviations (in Parentheses)of the Variance Estimates Observed in CFA and EFA

Table 2 .
Correlations of the Variance Estimates Observed in CFA and EFA