Structural Equation Modeling in Psychology : The History , Development and Current Challenges

Structural Equation Modeling (SEM) represents a series of cause-effect relationships between variables combined into composite testable models (Shipley, 2000). It is extensively used by researchers in different disciplines and is a technique often used in psychology. SEM has attracted attention primarily because it lends itself to effectively studying problems or models that are hard to assess using other procedures. This paper traces the history of SEM in discipline of psychology, and discusses the current developments, the use/misuse of SEM techniques and practical recommendations to direct future research in the discipline.


Introduction to Structural Equation Modeling
Structural equation modeling (SEM) is a major research tool that is rapidly growing in popularity.SEM is defined by some scholars (Pearl, 2000;Wright, 1921) as a statistical technique for testing causal relations, using a combination of statistical data and qualitative causal assumptions.SEM techniques are based on multivariate statistical procedures, which are widely used by researchers in different disciplines.SEM extends on conventional multivariate statistical analysis by accounting for measurement error and by more thoroughly examining goodness-of-fit.The SEM technique has grown out of path and factor analysis.
It is important to understand the history and developments of SEM because this is the lens through which we see the present and the future development of SEM.This lens is restrictive in that it restrains the way we approach research in this area, but it is also beneficial in that it helps us to solve the problems we face today.By understanding the solutions of the past new methods are suggested, and we can learn from the past, making sure that we don't repeat mistakes.As explained by Bollen (1998), in the past people working in this area tended to work independently, often in different disciplines, which meant that progress was not as rapid as it could have been.But even today there are few articles that track the historical progress in areas such as SEM.This paper is therefore designed to contribute to our understanding in this area.
In this study, the history of SEM in psychology will be presented and summarised using a new Time Map presented in Figure 1.This diagram indicates early developments in blue, later developments in yellow, and recent developments in red.Where possible developments in the Big Five Taxonomy will be used to provide examples for each stage of the SEM development.The early roots of SEM will be explored from the work of Pearson (1901) on orthogonal least squares, through to the growth of factor analysis spearheaded by Spearman (1904) and Thurstone (1935).It will be explained how SEM started with path analysis in various disciplines such as psychometrics, sociology, econometrics and biometrics.The interdisciplinary conference of economists, sociologists, psychologists, and statisticians in 1970 greatly influenced the integration of SEM in these disciplines.The work of Bentler (1986) and, more notably, the development of the EQS software in 1970 were significant in the application of SEM in psychology.This study will also explore current developments in SEM such as mixed models, meta-analysis and partial least square (PLS).Some of the more controversial debates relating to measurement model misspecification (formative vs. reflective) and the use of PLS-SEM (vs.CB-SEM) will also be discussed.Figure 1 can be utilized as a training tool for both Statistical and Psychology students to better understand the early roots and later developments of SEM.It is hoped that this will help to inspire future developments.It is believed that understanding the fundamentals and philosophy of a topic, creates a longer lasting, less biased in-depth learning experience.  1) Exploratory factor analysis (EFA) has made an important contribution in the social sciences by addressing the needs and interests of different disciplines.The primary roots of SEM in psychology belong to Pearson's (1901) theory on orthogonal least squares.Pearson's theory was not fully appreciated at the time, but later it became a foundation for principal component analysis and correlation matrix analysis (Hotelling, 1933).Spearman (1904), an English psychologist, also contributed substantially.He is commonly regarded as the pioneer of factor analysis given his work involving the finding of relationships between multiple correlated measures of cognitive performance.

Early Roots of Structural Equation Modeling (shown in blue in Figure
Using factor analytic data, Spearman postulated his original two-factor models of ability and intelligence testing, highlighting the theory testing nature of the method.Other scholars gradually adopted this theory testing trend using factor analysis (e.g.Anderson & Rubin, 1956;Guttman, 1952;Lawley, 1940;Mosier, 1939).
Spearman's two-factor theory was criticised widely (e.g.Thomson, 1916Thomson, & 1935;;Wilson, 1928Wilson, , 1929)).Between 1940 and 1951, factor analytic literature became increasingly atheoretical and the focus of criticism shifted to technical refinement.For these scholars, the Spearman two-factor methods were not appropriate for practical situations that involved the frequently encountered group factors.In 1931, Thurstone considered this to be one of the serious limitations of Spearman's method, mainly because psychological problems usually involve group factors (Thurstone, 1935).This limitation led to an interest in multiple factor analysis to supplement Spearman's model, whereby group factors were identified after extracting a general factor (e.g., Holzinger 1941).
Consequently, during this period some significant progress in statistical analysis occurred.One of the important advances in this period was perhaps the popularisation of multiple factor analysis.
It was during this period that Cattell (1945) derived the first personality model consisting of 12 factors using oblique factor analyses.The 35 items comprising these factors eventually became part of his 16PF Questionnaire (Cattell, Eber, & Tatsuoka, 1970).However later work suggested a clerical error with this analysis (Tupes & Christal, 1961) and some disagreement with the number and nature of the factors, although the second-order factors of the 16PF show some correspondence with the subsequent Big Five dimensions (Digman & Tatsuoka, 1970).
In practice the centroid method of factor analysis of Thurstone (1947) operated successfully and part of its theoretical background was recognised in the 1960s (Bentler, 1968;McDonald, 1970).However, from early 1980 the explicit optimisation functions used in factor analysis (such as least square, maximum likelihood (ML), minimum chi-square, etc.) became more popular.
The problem of rotating factor was avoided when confirmatory factor analysis (CFA) was introduced.In CFA, the number and patterns of factors and their loadings are specified at the start, transforming the problem into one of identification of a model's parameters from observed moments (Matsueda, 2012).
CFA was introduced originally by Tucker (1955) with Bechtold undertaking one of the early studies in 1961.By introducing an ML approach to factor analysis (Anderson & Rubin, 1956;Lawley, 1940), further development of CFA occurred.It was Jöreskog (1969) who made the method practical and developed computer software programs for CFA estimates using ML.This method was used by many authors to further refine and develop a model of personality.
But in addition to the early SEM work on measurement there was also some very early work on path analysis.Sewall Wright was the first to use path analysis in medical science when he started using it in his studies in the 1920s.Path analysis was one of the primary methods used to determine a causal structure.Wright used observed variables to develop a correlation matrix, and drew path diagrams indicating direct and indirect effects such as that found in Figure 2.

Simultaneous Equation Models in Economics
The development of SEM in econometrics can be attributed perhaps to Frisch and Waugh (1933), Haavelmo (1943) and Koopmans (1945).Frisch (1934), the founder of the Econometric Society and the Econometrica journal, invented the term "econometrics" and developed many of the principles of identification for SEM.Haavelmo (1943) was another economist who made some significant contributions.The advances made by Haavelmo, and Mann and others, led to work on SEM at the Cowles Commission (1952).This resulted in Haavelmo solving the major problems of identification, estimation, and testing for SEM.
According to Bentler (1986), one of the pioneer researchers was Goldberger, who introduced the integration of ideas related to SEM in different disciplines (e.g., Goldberger, 1971).This integration of ideas was one of the turning points in the evolution of SEM in the 1970s.

FASEM (Factor Analysis SEM)
FASEM is a generic acronym for factor analysis (FA) and structural equation modeling (SEM) which saw major development in the 1970s and 1980s.It was first used by Bentler (1986), to refer to conceptual approaches for modeling with continuous variables in SEM.
The Conference on Structural Equation Models in 1970 contributed greatly to the integration of SEM disciplines.The conference was an interdisciplinary forum of economists, sociologists, psychologists, statisticians, biometricians and political scientists and the academic papers were published by Goldberger and Duncan in a volume of Structural Equation Models in the Social Sciences in 1973.The factor analysis of structural equation modeling (FASEM) and linear structural relations (LISREL) were the main outcomes of this integration.At the time, simultaneous equation and path analysis methods were the main contributors to FASEM and LISREL.
According to Bentler (1986), the major achievements in the 1970s can be categorised into three sections: structural concepts, statistical theory and practical development.The two key papers published in this period were written by Hauser and Goldberger (1971) and Jöreskog (1973).Hauser and Goldberger's (1971) examination of unobservable variables is an exemplar of cross-disciplinary integration, drawing on path analysis and moment estimators developed by Wright in the 1920s and various sociologists.It also incorporates factor-analytic models from psychometrics, efficient estimation, and Neyman-Pearson hypothesis testing from statistics and econometrics.Hauser and Goldberger used limited information estimation to gain a better understanding of structural equations estimated by ML.Jöreskog (1973).They presented an ML framework for estimating SEMs, developed a computer program for empirical applications, and showed how the general model could be applied to a myriad of important substantive models.Goldberg (1981) has been credited with the naming of the "Big Five" after numerous researchers (e.g.Norman (1963), Borgatta (1964), Digman and Takemoto-Chock (1981)) were able to replicate the five factor structure foreshadowed in Cattell's work using lists of items derived from Cattell's original 35 items.However, it was acknowledged that these five dimensions represented personality only at the broadest level of abstraction.Cross-language research by Hofstee et al. (1997) suggests that although the Big Five can be replicated in Germanic languages, evidence in non-Germanic languages is less convincing (John et al., 1999).

Linear Bentler-Weeks and Nonlinear SEMs
There is evidence that the turning point in the application of SEM in psychology dates back to the 1970s and 1980s, primarily through the work of Bentler and, more particularly, the development of the EQS structural equation modeling software (Matsueda, 2012).Using such analytical software for evaluating their models allows researchers to make better use of their data and to study the empirical applications of several new methods proposed in the literature (Bentler, 1986).During the 1980s some researchers paid attention to nonlinear SEMs, which helped to extend the overall scope of SEM.Some important developments in nonlinear latent variable SEM, particularly those for categorical data, appeared in the 1980s, mainly in the works of Bock and Aitkin (1981), Mislevy (1984) and Muthén (1984).

Formative Models
The first appearance of formative measures possibly goes back to the Berkson error model for radiation epidemiology studies in the 1950s.In Classical Test Theory (CTT), the observed score is considered equal to the true score plus measurement error, while in the Berkson error model the true score is equal to the observed score plus measurement error (Carroll, Ruppert, Stefanski, & Crainiceanu, 2006).This concept has become known as the Berkson measurement error and is the cornerstone of what is today known as formative models.Although the concept of formative measures was introduced by Berkson in 1950, it did not attract enough attention until the late 1960s.While many scholars (e.g.Blalock, 1971;Bollen, 1989;Diamantopoulos & Winklhofer, 2001;Jarvis et al., 2003;Petter, Straub, & Rai, 2007) have alerted researchers to the relevance of formative models in specific situations, these models have been underemphasized in the literature.
Although the dimensions of personality cannot be viewed as formative because they reflect the personality rather than cause personality there are many other formative indicators in psychology.An example quoted by Hoyle (2011) concerns measures of life stress.The occurrence of each stressful event increases life stress and failure to include all such events will result in an incomplete measure of life stress.

Multiple-Indicators Multiple-Causes Model (MIMIC)
One of the models that was estimated by Wright in the 1920s using path analysis, is similar to what is now known as a MIMIC model (Matsueda, 2012).The main advancement in MIMIC was achieved through the works of Jöreskog and Goldberger and Hauser, and Goldberger in the 1970s.They introduced maximum likelihood as the estimation method for over-identified MIMIC models.
These MIMIC models were used by John et al. (1999) to test the convergence and discriminant validity for three personality instruments; Goldberg's(1992) TDA, Costa and McCrae's (1992) NEO questionnaire and the John et al. ( 1881) BFI.They found that the Big Five are "fairly independent dimensions that can be measured with convergent and discriminant validity".However, their CFA analysis of this model confirmed that "five latent, modestly correlated personality factors capture the major sources of variance" and that three smaller method factors represented the "trait-specific variance" for the three instruments.
The release of the LISREL statistical software by Jöreskog in the 1970s has produced the biggest advancement in estimating MIMIC models.LISREL is still popular among scholars because of its ability to incorporate factor analysis, path analysis, SEMs into a general covariance structure model (Jöreskog and Sörbom 2001;Matsueda, 2012).Using the MIMIC model, identification and estimation of formative models has become feasible.

The current Developments in SEM
Some of the most current developments in SEM include multilevel-mixture models, generalized linear latent and mixed modelling (GLLAMM), partial least square (PLS) and SEM-based meta-analysis.

Multilevel and Mixture Models
Using multilevel SEM, separate models for within and between group covariances are modeled.Further, by using a multiple group analysis, the parameters can be calculated simultaneously for both levels (Muthén, 1994).Although this estimation method can be applied using almost any SEM software, this is generally only for a few specific models.
Many personality meta analyses and multi-level analyses have been conducted.For instance Linden et al ( 2010) have provided a meta-analysis for a general factor of personality (GFP), providing robust support for a GFP in a SEM analysis involving 212 studies, while Roesch et al. (2010) provide a multi-level SEM (MSEM) to show the effect of coping strategies on positive and negative affect using daily diary data.

GLLAMM
As mentioned above, the use of multilevel models is limited to specific models and cannot be applied to all models.In response to this limitation, a more advanced and general estimation method, GLLAMM, was introduced by Rabe-Hesketh, Skrondal & Pickles (2004), and further developed by Skrondal and Rabe-Hesketh's (2004).GLLAMM has three main components: a generalized linear model, a structural equation model for latent variables, and distributional assumptions for latent variables (Matsueda, 2012).The generalized linear model is capable of analysing all types of data; continuous, ordinal, dichotomous and discrete.The GLLAMM program is now part of the Stata program.Many of the GLLAMM models can also be analysed by MPlus which is another powerful software package, developed by Muthén and Muthén (2004).

Partial Least Squares (PLS)
The roots of PLS, as well as graphical models, can be traced to Herman Wold in 1977 (Geladi, 1988).PLS modeling was then extended into the SEM area while entertaining principal component analysis.
Originally, PLS was developed to solve the problem of multicollinearity in multiple regression analysis.According to Wold (1979), PLS regression was an appropriate estimation method in complex models with undeveloped theoretical backgrounds.The original application of PLS was more for predictive models (Barclay, Higgins, and Thompson, 1995).Later, as an alternative to Jöreskog's covariate-based SEM (CB-SEM) approach, Wold introduced SEM based on PLS.Because PLS-based SEM has fewer underlying restrictions, such as normally distributed data and large sample size, it came to be known as "soft modelling".Despite the less restrictive nature of PLS-based SEM, it is still not as popular as covariate-based SEM.The main reason for this previously was a lack of software for model estimation, but this problem is now being addressed.
Since 1984, and especially from the early 2000s, more user-friendly software has been introduced for the estimation of PLS-based SEM, adding to the popularity of the method.Software such as LISREL (Jöreskog, 1977), MPlus (Muthén & Muthén, 2004), PLS-GUI (Li, 2005), Visual PLS (Fu, 2006a), PLS-Graph (Chin, 2004), SmartPLS (Ringle et al. 2005), SPAD-PLS (Test&Go, 2006) and XLSTAT (Addinsoft, 2008) are some of the recent developments in this area (Morales, 2011).There have been many debates among the scholars on the application of PLS and the lack of an overall goodness-of-fit test, and the implications of this are discussed in the next section briefly.

SEM-Based Meta-Analysis
The concept of SEM-based meta-analysis was introduced by Cheung (2008).Cheung developed a SEM framework for integrating SEM results from different studies.Based on this approach, studies in meta-analysis can be considered as subjects in SEM.Although the proposed approach added a new and important methodological development in SEM, it is not yet fully incorporated into the current popular SEM software, limiting its further application in practice.

Discussion
SEM is rapidly growing in popularity as a major research tool in psychology.The early foundation of SEM can be traced back to factor analysis, principal component analysis, regression and path analysis.It started in various disciplines such as psychometrics, sociology, econometrics and biometric path analysis.The interdisciplinary conference in 1970 greatly influenced the integration of SEM disciplines.The work of Bentler and, especially, the development of the structural equation modeling software (EQS) in 1970 was another turning point for the application of SEM in psychology.Since then, SEM has rapidly developed through different approaches such as linear Bentler-Weeks, MIMIC, FASEM, and formative models.Other recent development such as PLS, GLLAMM, multilevel and mixture models have extended the application of SEM techniques to a higher level.
Although this area is progressing rapidly, there is a danger that the technique will be misused due to its complexity or insufficient knowledge of psychological researchers.Some of the most controversial debates relate to model misspecification (formative vs. reflective) and the use of PLS-SEM (vs.CB-SEM).These two issues are described in more detail in the next section to highlight their importance.

Formative vs. Reflective Models
One of the recent debates in SEM concerns measurement model misspecification.The existing evidence of SEM model misspecification reveals that the distinction between formative and reflective indicators needs to be clarified.When indicators are affected by a latent variable, reflective models are appropriate.However, in many settings, where indicators are the cause of a latent variable, formative models are deemed to be more accurate and appropriate.
By default, most researchers assume that models are reflective.Many scholars (e.g.Blalock, 1971;Bollen, 1989;Diamantopoulos and Winklhofer, 2001;Jarvis et al., 2003;Petter et al., 2007) have alerted researchers to the relevance of formative models in some specific situations.However, this message is often lost in the literature.
According to Fornell and Bookstein (1982), in reflective models the items are indicators of a latent factor.These models provide the trigger for reliability evaluation and common/confirmatory factor analysis (Bollen 1989;Long, 1983;Nunnally, 1978).
Conversely, depending on the nature of the measure, the indicators might cause the construct (Bollen & Lennox, 1991).When the construct is moulded by its measures, a formative model is suggested (Fornell & Bookstein, 1982).Based on what has been discussed in the preceding section, it is crucial to provide a clear, well-defined decision-making framework for assessing reflective and formative models.
Because of misspecification, some of the findings in the literature might be misleading (Jarvis et al. 2003;MacKenzie et al. 2005;Petter, Straub, & Rai, 2007).Although there are strong guidelines for fitting reflective models, less is known about procedures for fitting formative models.In recent years, a few scholars have paid attention to formative indicators and have suggested specific guidelines for the appropriate use of these models (Diamantopoulos & Winklhofer, 2001;Jarvis et al., 2003;Petter et al., 2007).More attention to measurement model specification is needed for future studies using SEM.

PLS-SEM vs. CB-SEM
Debate over the use of covariate-based SEM (CB-SEM) over partial least square-SEM (PLS-SEM) has existed from the early years of development of these procedures.In particular some scholars have been questioning the practicality and generalisability of the PLS method for factor estimation.
In spite of the wide criticism of PLS in the literature, PLS has specific strengths in certain situations which have been misunderstood or ignored by CB-SEM proponents.A comparison of some of the main features of both approaches, along with some of the existing criticisms, will be presented below.
Prediction validity.The literature shows that PLS has great capability as a prediction tool, a fact that has not been fully appreciated.PLS is considered to be a good inferential tool, a correct method for formative constructs and for developing measurements with new theoretical or empirical backgrounds (Ridgon, 2012).Pro-PLS scholars believe that by using research data, one can help in building empirical background and unobservable conceptual variables (Ridgon, 2012).On the other hand, CB-SEM followers believe that one should specify a conceptual structure and seek evidence regarding whether these structures are consistent with empirical evidence, so that results can challenge, support, or modify those conceptualizations.
Fit assessment test.CB-SEM assesses the overall fit of the model using the covariance among the items, assuming that all measures are reflective, with less interest in the individual effects of construct or path coefficients.In contrast, PLS does not rely on item covariance and overall goodness-of-fit; instead, the focus is on the variances of predicted variables or construct variances (Chin, 2010).Thus, in practice, in the presence of formative constructs, PLS might be a better choice than CB-SEM.
Theoretical background.Due to the holistic and confirmatory approach of CB-SEM, it is more useful when there is solid theoretical and background knowledge for the model.In contrast, a PLS approach, with its exploratory nature and focus on the significance and strengths of individual paths and constructs, seems to be an appropriate procedure for new models, and particularly useful in behavioural and social sciences when there is limited background knowledge of the expected model (Chin & Newsted, 1999;Chin, 2010;Roldán & Sánchez-Franco, 2012).
Normality assumption.CB-SEM commonly uses ML estimation assuming a normal distribution for the data, while, for PLS, there is no underlying assumption for the data distribution.This means that, for non-normal data, the use of variance-based PLS is justified when sample sizes are too small to allow asymptotically distribution-free CB-SEM or bootstrap analyses.
Sample size.One of the requirements of using CB-SEM is having a relatively large sample size, while PLS can be conducted with small sample sizes.In PLS, the estimators are inconsistent and biased, in that standard errors do not decline with increasing sample size and expected parameter estimates do not converge to their true values.In CB-SEM models, if the underlying assumptions are met, consistency is ensured.PLS-SEM and CB-SEM are two different approaches for estimating SEM models.Each approach is suitable for a specific context.Researchers need to appreciate the differences between the methods in order to use the more appropriate approach (Hair, Black, Babin, & Anderson, 2010;Hair, Ringle, & Sarstedt, 2011;Hair, Hult, Ringle, Sarstedt, 2014).As acknowledged by Hair et al.(2011), neither model is superior to the other and "depending on the specific empirical context and objectives of a SEM study, PLS-SEM's distinctive methodological features make it a valuable and potentially better-suited alternative to the more popular CB-SEM approach" (p.149).

Figure 1 .
Figure 1.Pseudo path diagram of some developments in SEM model structures Acknowledgment: Special thanks to Professor Peter Bentler (personal communication, 2012), for his inspiration and input into developing the diagram.

Figure 2 .
Figure 2. One of Wright's first path diagrams for genetic modelling Source: Wright, Sewall (1920).The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs.Proceedings of the National Academy of Sciences, 6, 320-332.