Invariance Test: Detecting Difference Between Latent Variables Structure in Partial Least Squares Path Modeling

In the context of heterogeneity, almost all partial least squares path modeling (PLS-PM) approaches focus on differences in the causal relationships between the latent variables. The principal goal is to detect segments that have different path coefficients in the structural model, yet inadequate attention is generally given to the measurement model. Thus, anytime that we define specific sub-models for different groups of individuals, we may wonder if the latent variables are the same in all detected sub-models. Taking this into consideration, the problem of invariance arises, meaning that if the estimation of latent variables are specific in each sub-model, there is reasonable doubt regarding whether we can compare the distinct behavior of individuals who belong to two different segments. In this paper, we present an invariance test as a possible solution, whereby the goal is to verify whether or not the measurement models of each sub-model may be assumed equal among themselves.


Introduction
Measurement invariance is a statistical property of measurement that is sometimes referred to as measurement equivalence.It demonstrates whether or not we are measuring the same constructs in various specified groups.As argued by (Vandenberg, R. J. & Lance, C. E., 2000), the establishment of measurement invariance across groups is a logical prerequisite to conducting substantive cross-group comparisons.One case in which we would use measurement invariance might be when we want to establish if respondents are interpreting a given measure in a conceptually similar way, which may not always be the case if they represent different genders or come from different cultures.If measurement invariance is not established, it may not be possible to unambiguously interpret the measurement of latent constructs.In partial least squares path modeling (PLS-PM), the invariance problem greatly increases in importance when we analyze data by considering potential sources of heterogeneity and fitting more than one model according to some variables relating to socioeconomics or psychographics (i.e segmentation variables).In this situation, it becomes difficult to guarantee that each construct in each sub-model is measuring the same latent concept.In this case, it is not reliable to compare the latent variables on individuals belonging to different segments.We start this paper by providing a short introduction to partial least square path modeling methodology (section 1), presenting the the invariance problem (section 2) and the different approaches that can be followed to face with it (section 3).In section 4, we discuss our contribution: the Invariance test.In section 5, we show a simulation study with artificial data in order to evaluate the sensitivity of the test.In section 6, we present the results from applying the test on real data application.The paper closes with conclusions on the suitability of the proposed approach (sections 7 and 8).

Brief Introduction to PLS-PM
Partial least squares path modeling (PLS-PM) is one of the methods from the broad family of PLS techniques.It was originally developed by Herman Wold and his research group during the 1970s and the early 1980s.PLS-PM is based on three fundamental concepts: it is a multivariate method for analyzing multiple blocks of variables; each block of variables plays the role of a latent variable; it is assumed that there is a system of linear relationships between blocks.In other words, PLS-PM provides a framework for analyzing multiple relationships between a set of blocks of variables (or data tables).It is supposed that each block of variables is represented by a latent construct or theoretical concept; the relationships among the blocks are established taking into account previous knowledge (theory) of the phenomenon under analysis.There are plenty of references about PLS-PM, but we will mention only one from Wold and two more recent ones: (Esposito Vinzi, et al., 2010;Tenenhaus, et al., 2005;Wold, H., 1982).

The Outer Measurement Model in PLS-PM
In PLS-PM, the latent constructs are always measured as linear composites of their indicator variables: We assume that this linear composite is in fact a proxy of the latent construct ξ, which we try to measure: where ξ represents a specific construct of the model (measured by a block of manifest variables, x k , where k = 1, • • • , p k ) and δ represents the deviation of the actual linear composite in respect to the latent variable, which we take as random perturbation.Hence, a construct ξ is formalized via the empirical meaning of the observed variables, including their random perturbation as well.Supposing we have a PLS-PM model measured is S subgroups, it can be written as: where s stands for specific segment, η s stands for the vector of all endogenous latent variables, B s stands for the matrix of all coefficients, Ξ s stands for the vector of the latent variables with an exogenous role, and ζ s stands for the vector of random perturbation of the endogenous constructs.
Let us suppose the following model: This model can be written in a matrix notation as follows: where p and q are, respectively, the number of endogenous and exogenous latent variables.PLS-PM segmentation usually considers differences in the inner model that lead to different models for each subgroup.This approach is adequate, since we are mainly interested in analyzing differences in the relationships between constructs among different subgroups.However, it is incomplete, since we do not assess if the same construct is measured in every group.
We refer to this situation as the invariance case, meaning that every construct is equally measured in all segments.Hence, if the invariance holds, it means that the weights ω k for each construct need to be the same across the segments.

Invariance Problem
To the extent of our knowledge, in the context of heterogeneity in PLS-PM, the analyst can follow to different approaches to assure invariance case in PLS-PM: 1) imposing a restriction on the latent constructs restriction approach; 2) realizing a comparison of constructs obtained in all sub-models comparison approach.
The restriction approach involves to maintain the latent variables across all the sub-models by fixing the weights for the data.It means that we do not recalculate the latent variables when two or more sub-models are considered.This solution is implemented in the SEM tree procedure proposed by (Brandmaier, A. M., et al., 2013).It consists in fitting one global model for all data and calculating weights of the latent variables; then, when the heterogeneity is considered, and data is splitted according to some segmentation variables, we use the weights of the global model to recalculate the latent constructs in each segment.This is a straight solution, however it goes against the philosophy of "soft-modeling" featuring a major aspect of the PLS-PM methodology and it is not followed by the principal approaches that take in account the heterogeneity in PLS-PM (Chin, 2000(Chin, , 2003;;Henseler, 2007;Vinzi, et al., 2008).An alternative could be the comparison approach.In this case we do not put any restriction on the latent variables: the constructs are recalculated for all segments.Then, to verify the invariance case we can compare the weights of the submodels.In this case the problem consists in to define a criterion that allow to compare the obtained weights.
The Invariance test presented in this paper follows the comparison approach.

Invariance Test
To test the invariance of the measurement model across different sub PLS-PM models we follow the same approach of model comparison, first stated by (Chow, 1960;Lebart, et al., 1979;Lamberti, et al., 2016).We propose to perform a comparison of models test applied to the outer model.The current model is formed by the juxtaposition of all outer models of the identified segments, each one with their corresponding specific weights defining the constructs, whereas the model of null hypothesis is formed assuming the same weights for every construct in all segments.Then, we can test the invariance of weights across the sub-models.Non significance of the statistic reveals that we can assume a unique set of weights for every construct in all sub-models, that is, there is factor invariance, otherwise we will accept that not only structural models differ but also measurement models in the detected segments do.

Testing the Equality of the Sub-model Weights
Let's take a PLS-PM model with k latent variables.Without loss of generality, we use the example in figure , where we have two segments (S = 2) called segment A and segment B; let X, Y 1 and Y 2 be respectively, the associated blocks of ξ, η 1 and η 2 with p 1 , p 2 , and p 3 indicators respectively (in general, p k denotes the number of indicators of block k and K the total number of blocks).Let n be the total number of individuals (n = n A + n B ).We want to investigate if we can assume the existence of common weights for both sub-models.The null hypothesis means that there are common weights for all constructs ω ξ , ω η 1 , ω η 2 , whereas the alternative hypothesis specifies that every segment has its own specific weights.Then, we can write the measurement model as a concatenation of all constructs in all segments and define the two hypotheses as follows: H 1 : which can be combined into more compact expressions as: Then, assuming y ∼ N(Xβ, σ 2 ), we know that (Lebart, et al., 1979): 1.The quadratic form ϵ ′ Qϵ/σ 2 , where Q = I n − X(X ′ X) −1 X ′ is symmetric and idempotent matrix, follows a χ 2 distribution with v degrees of freedom (where v is the rank of Q).
2. Given two matrices X and X 0 ,where X 0 is defined as X 0 = XA (for any matrix A), it can be shown that ϵ ′ (Q 0 −Q)ϵ/σ 2 follows an χ 2 distribution with (v 0 − v) degrees of freedom, where: Thus, it is easy to see that the design matrices of the precedent hypothesis can be written as X 0 = XA, taking: where I P k is the identity matrix of order p k .
Assuming that every construct in every segment is normally distributed with equal variance, the difference of the residuals sum of squares of both hypotheses follows a scaled χ 2 distribution with (S − 1)( ∑ K k=1 p k ) degrees of freedom.
Thus, we have the following result:

Simulation with Artificial Data
We have conducted a simulation study in order to obtain some insight into the proposed test criterion under different but realistic experimental conditions.The goal was to investigate whether the χ 2 test is able to discriminate when the measurement model can be considered the same for all identified sub-models or if it is specific for each one of them.Hence, we have run a series simulation analysis by generating models under different experimental conditions.Our simulations are based mainly on (Cassel, et al., 1999), and (Westlund, et al., 2001).We carried out a Monte Carlo simulation study with a simple path model using one exogenous (ξ) and two endogenous (η 1 and η 2 ) latent variables.We have varied the data generating conditions by trying to reproduce similar conditions like those encountered in real-life application studies.

Simulated Models
Moreover, we have considered two different models, which we will call Models 1 and 2. The first one is entirely reflective, whereas the second one contains a formative block in the exogenous construct.
The data in model 1, have been generated according to the structural model in figure , and by following a two step procedure (Reinartz, et al., 2002): first, we generated the latent variables data, following the relationship specified in the structural model, and then, we reproduced the manifest variables data from the latent variables.
The PLS model consists of one exogenous (ξ) and two endogenous (η 1 and η 2 ) latent variables.The inner structure is defined as: where β are regression coefficients and ζ are the error terms associated to the endogenous latent variables.The manifest variables are denoted by x for ξ and by y for η.The measurement models for ξ, η 1 and η 2 are reflective and defined as: Description: the plot represent the causal path between the exogenous (ξ) and two endogenous (η 1 and η 2 ) latent variables of PLS Model 1.
The λ terms are coefficients, and the ϵ terms are random errors.The noises ϵ and ζ are realizations from a Normal distribution N ∼ (µ = 0, σ 2 ).We have used a Beta distribution B ∼ (6, 3) for the exogenous latent variable ξ, which is usually found in survey studies.
Model 2 (see figure ) differs from Model 1 in the way in which we have related the manifest variables with the exogenous latent variable; in this case, in fact, the measurement model of ξ is formative, and it is defined as: The π terms are coefficients, and the δ ξ term is the random error, which was introduced in the model due to the hypothesis that some manifest variables that explain the construct, are not considered.The x values have been generated from a multivariate normal with a mean vector µ = (0, 0, 0) and a covariance matrix to simulate a bidimensional construct ξ: The noises δ are realizations from the Normal distribution N ∼ (µ = 0, σ 2 ).We have used the same process adopted for Model A to generate the endogenous latent variables and the corresponding manifest variables.

The Experimental Conditions
We have evaluated the performance of the invariance test under different experimental conditions.The factors of the experimental design are the following: sample size, distinct levels of standard deviation of measurement errors, distinct levels of difference between coefficients of the structural model and distinct levels of difference between coefficients of the measurement model.Description: the plot represent the causal path between the exogenous (ξ) and two endogenous (η 1 and η 2 ) latent variables of PLS Model 2.
1. Size.We consider three sample sizes as the total number of cases: {100, 400, and 1000}.
2. Standard deviation of measurement errors.We assume that the error terms ϵ, follow a Normal distribution with zero expectation and three levels of standard deviation: low noise σ = 0.05, moderate noise σ = 0.2 and high noise σ = 1.
3. Difference between path coefficients.The model has been estimated in two segments A and B, varying the level of the difference between path coefficients.More specifically, they can be equal in both segments, or the difference can be small, medium and large, meaning that we have added +0, +0.2, +0.4 and +0.6, respectively to the corresponding path coefficients of segment A.

Difference between coefficients of the measurement model.
As in the case of the difference between the path coefficients, we have considered various levels of difference between the measurement model coefficients, In other words, they can be equal in both segments, or the difference can be small, medium and large, meaning that we have added +0, +0.2, +0.4 and +0.6, respectively to the corresponding measurement model coefficients (loadings or weights) of segment A.
In total, we have 3 × 3 × 4 × 4 = 144 scenarios, which are the number of possible combinations of sample sizes, noise levels, differences between path coefficients, and differences between coefficients of the measurement model.We run 50 repetitions for each experimental condition and present the mean for each 50 repetitions as an aggregate result.

I 6.3 Simulation Results
To assess the influence of each experimental condition on the split criterion, we focus on the evolution of the p-value computed from the χ2 statistic1 , and we do so by examining how it is affected by the different levels of the experimental factor.For ease of interpretation, we have included in each plot the LOWESS regression line of the p-value with respect to the evaluated experimental condition.We provide the results for both Models 1 and 2 in order to verify the effectiveness of the invariance measurement test when all constructs are reflective or the model includes a formative construct.These results are graphically illustrated in figure and figure .There are four plots which represent each of the trends mentioned above.In the first plot, it is possible to observe how the p-values decrease (i.e., they become more significant) as the sample size increases.In the second plot, the level of measurement error in the manifest variables does not affect the sensitivity of the invariance test.In the third plot, we can see that, as expected, increasing the difference between segments of the path coefficients in the structural model does not affect the invariance test.Whereas in the last plot we can clearly appreciate the effect of increasing the difference in the coefficients of the outer model (loadings, in that case).
Looking at the simulation of Model 2, we obtain the same results as for Model 1, even in the presence of a formative exogenous construct.(Fornell, 1992).
Having no previous knowledge about the significant sub-groups, we have decided to apply a Pathmox analysis (Lamberti, et al., 2016).This technique, based on recursive partitioning, produces a segmentation tree with a distinct path model in each node.At each node, Pathmox searches among all splits based on the segmentation variables and chooses the one resulting in the maximal difference between the PLS-PM models in the children nodes.
For the final results of the split process (see figure ), we obtain three distinct models according to two segmentation variables: the carrier and the gender.The corresponding models are: node 2 model of carrier A customers, node 6 model of carriers B-C male customers, node 7 model of carriers B-C female customers3 .
In table we show the path coefficients of the detected segments.Focusing on the Customer Satisfaction and Loyalty model, we can see that, for customers of operator A, the important driver for Customer Satisfaction is the Image of the operator, whereas for customers of B and C, the important driver for Customer Satisfaction is the Perceived Value obtained from the mobile phone.Concerning the role of Perceived Quality, it has a positive effect for customers of operator A and customers of B and C operators who have a basic education, whereas it is irrelevant for customers of operators B and C with higher education.Looking at the path coefficients of the Loyalty construct in the three segments, it happens that Image is again the important driver for customers of operator A, whereas it is Customer Satisfaction for customers of B and C.

Conclusions
In the present article we introduce the factor invariance problem in PLS-PM when more then one model is considered due to the presence of heterogeneity.We propose an invariance test as a possible solution for verifying whether we can directly compare the latent variables between different sub-models.We prove by simulation study that the test performs well under different experimental conditions.As expected, we found in the simulation results a clear effect of sample size and an outstanding effect of the difference between the coefficients of the measurement model in the two segments, while at the same time there was no effect of noise level or of the difference between the path coefficients.These encouraging results open a line of research for assessing the invariance problem in PLS-PM, however these results have to be further assessed in non-normal contexts and with heterogenous variances between segments.Finally, we show the feasibility of the invariance test in a real application study on customer satisfaction in the mobile phone services sector, demonstrating that the test is sufficient for analyzing the measurement model structure of the three identified sub-models and in this sense we can use the global model in all segments to compare the latent variables of individuals.

Figure 1 .
Figure 1.Path diagram of a PLS model with one exogenous and two endogenous latent variables

Figure 5 .
Figure 5.Comparison of the invariance measurement test by the distinct simulation factors of Model 2 Description: the plot shows the result of the invariance measurement test by the distinct simulation factors of Model 2.

Figure 7 .
Figure 7. Pathmox tree of mobile data Description: the plot allows to visualize graphically the split process realized by Pathmox giving information about which are the root and the terminal nodes, the segmentation variables that produce the splits and the significance of the split.

Table 2 .
Weights of the root node and the terminal segments Description: the table shows the description of the manifest variables and the weights of the root node and the terminal segments for Customer Satisfaction and Loyalty