A Gentle Introduction to Structural Equation Modeling (SEM), Part 1: The Simplest Case

6 Likes

This is the first in a multi-part series on structural equation models, or SEMs. In this blog, you will learn to estimate a mean vector and a covariance matrix from data using the CALIS procedure.

How does SEM work?

A structural equation model (SEM) is a modeling technique for explaining and testing hypotheses about complex relationships among variables (observed and unobserved) that make up a system or phenomenon. The models can cite their lineage back to psychometrics, econometrics, and biometrics, and are especially interesting for directly testing a complex hypothesis of interest in one go.

We can make comparisons between SEMs and linear models. The SEM represents your hypothesis, just as a linear model does. However, there are some key differences.

In most uses of linear models, the associations among variables, error terms, and variances are defined by the analysis. For example, multiple regression allows all the predictors to covary. While having uncorrelated inputs makes the model easier to interpret, regression leaves all that correlation in the model through the parameter estimation method, .
With SEM, you specify the nature of the hypothesized relationships among the variables that you are interested in. This means that if my theory says that X1 and X2 are correlated, but X1 and X3 are not, then the model has a structural (or fixed) zero for the association between X1 and X3. This puts a restriction on X’X.
SEM compares your hypothesized (restricted) model to a model in which all parameters and relationships are free to vary (full model). So, you represent your theory as a null hypothesis, which can mean fixing certain parameters at zero, setting some parameters to be equal to one another, and so forth.

We will see some comparisons of SEM to linear models in a future blog post. Today, let’s discuss a very simple application of SEM. In this series, I will be using simulated data that you can play with, too. See the end of the blog for the DATA step code.

Example 1: Estimate a variance-covariance matrix

One of the simplest examples of SEM is estimating a variance-covariance matrix for your data. There’s very simple code to do this in PROC CALIS:

proc calis;
       mstruct var=y x1 x2 x3;
run;

The MSTRUCT statement is one of several different methods for specifying a model in PROC CALIS. The MSTRUCT statement enables you to directly specify the covariance matrices for your hypothesized model and is well-suited for situations where thinking about the model in terms of matrices is the most straightforward approach.

Here are the results for the estimated covariance matrix (partial output):

Simple Statistics
Variable	Mean	Std Dev
x1	-0.16467	25.63234
x2	0.07707	40.89412
X3	1.86254	57.22332
y	16.06475	56.62608

MSTRUCT _COV_ Matrix: Estimate/StdErr/t-value/p-value
	y	x1	x2	X3
y	3207 143.4714 22.3495 <.0001	687.6537 50.8152 13.5324 <.0001	1683 90.5659 18.5803 <.0001	2017 120.7609 16.7036 <.0001
x1	687.6537 50.8152 13.5324 <.0001	657.0167 29.3974 22.3495 <.0001	228.2159 33.9409 6.7239 <.0001	37.4505 46.4216 0.8067 0.4198
x2	1683 90.5659 18.5803 <.0001	228.2159 33.9409 6.7239 <.0001	1672 74.8262 22.3495 <.0001	29.1822 74.0432 0.3941 0.6935
X3	2017 120.7609 16.7036 <.0001	37.4505 46.4216 0.8067 0.4198	29.1822 74.0432 0.3941 0.6935	3275 146.5137 22.3495 <.0001

So, in this simple example, we have estimated summary statistics (the mean and standard deviation vectors) and a covariance matrix of a data set with 4 variables. The covariance matrix provides information about relationships between variables or structure in the data. For example, in the top or Y row of the covariance matrix, the estimates indicate that the covariance between Y and each of the three X variables is significantly different from zero. However, the second or X1 row indicates that the covariance between X1 and X3 is probably zero.

You can also estimate a covariance matrix in PROC CORR:

proc corr cov;
          var y x1 x2 x3;
run;

I've shown you a really simple example, and there are many compelling reasons to use an SEM to estimate a covariance matrix. Testing covariance patterns have been an important topic in multivariate statistical analysis. Traditionally, statisticians derive test statistics separately for different pattern hypotheses (e.g., sphericity, equi-covariance, and so on). By using an SEM approach, all these tests are unified in a single SEM framework so that all you need to do is to specify the covariance pattern you want to test using software that supports direct covariance structure modeling, such as the MSTRUCT language in PROC CALIS!

Want to see some more examples of testing covariance pattern hypotheses? Check out this article by our brilliant PROC CALIS developer:

Yung, Y.-F., Browne, M. W., & Zhang, W. (2015). Fitting direct covariance structures by the MSTRUCT modeling language of the CALIS procedure. British Journal of Mathematical and Statistical Psychology, 68, 178-193.

We’ll see lots of other SEM examples in the next few posts. In the meantime, have some fun playing with the code above, and stay tuned for more.

Next time, we will extend the SEM specification to linear regression and talk about diagrams.

SAS Communities Library