BookmarkSubscribeRSS Feed

A Gentle Introduction to Structural Equation Modeling (SEM), Part 1: The Simplest Case

Started ‎08-30-2023 by
Modified ‎08-30-2023 by
Views 3,575

This is the first in a multi-part series on structural equation models, or SEMs. In this blog, you will learn to estimate a mean vector and a covariance matrix from data using the CALIS procedure.

How does SEM work?

A structural equation model (SEM) is a modeling technique for explaining and testing hypotheses about complex relationships among variables (observed and unobserved) that make up a system or phenomenon. The models can cite their lineage back to psychometrics, econometrics, and biometrics, and are especially interesting for directly testing a complex hypothesis of interest in one go.

 

We can make comparisons between SEMs and linear models. The SEM represents your hypothesis, just as a linear model does. However, there are some key differences.

 

  • In most uses of linear models, the associations among variables, error terms, and variances are defined by the analysis. For example, multiple regression allows all the predictors to covary. While having uncorrelated inputs makes the model easier to interpret, regression leaves all that correlation in the model through the parameter estimation method, .
  • With SEM, you specify the nature of the hypothesized relationships among the variables that you are interested in. This means that if my theory says that X1 and X2 are correlated, but X1 and X3 are not, then the model has a structural (or fixed) zero for the association between X1 and X3. This puts a restriction on X’X.
  • SEM compares your hypothesized (restricted) model to a model in which all parameters and relationships are free to vary (full model). So, you represent your theory as a null hypothesis, which can mean fixing certain parameters at zero, setting some parameters to be equal to one another, and so forth.

 

We will see some comparisons of SEM to linear models in a future blog post. Today, let’s discuss a very simple application of SEM. In this series, I will be using simulated data that you can play with, too. See the end of the blog for the DATA step code.

 

Example 1: Estimate a variance-covariance matrix

 

One of the simplest examples of SEM is estimating a variance-covariance matrix for your data. There’s very simple code to do this in PROC CALIS:

 

proc calis;
       mstruct var=y x1 x2 x3;
run;

 

The MSTRUCT statement is one of several different methods for specifying a model in PROC CALIS. The MSTRUCT statement enables you to directly specify the covariance matrices for your hypothesized model and is well-suited for situations where thinking about the model in terms of matrices is the most straightforward approach.

 

Here are the results for the estimated covariance matrix (partial output):

 

Simple Statistics
Variable Mean Std Dev
x1 -0.16467 25.63234
x2 0.07707 40.89412
X3 1.86254 57.22332
y 16.06475 56.62608

 

MSTRUCT _COV_ Matrix: Estimate/StdErr/t-value/p-value
  y x1 x2 X3
y 3207 143.4714 22.3495 <.0001 687.6537 50.8152 13.5324 <.0001 1683 90.5659 18.5803 <.0001 2017 120.7609 16.7036 <.0001
x1 687.6537 50.8152 13.5324 <.0001 657.0167 29.3974 22.3495 <.0001 228.2159 33.9409 6.7239 <.0001 37.4505 46.4216 0.8067 0.4198
x2 1683 90.5659 18.5803 <.0001 228.2159 33.9409 6.7239 <.0001 1672 74.8262 22.3495 <.0001 29.1822 74.0432 0.3941 0.6935
X3 2017 120.7609 16.7036 <.0001 37.4505 46.4216 0.8067 0.4198 29.1822 74.0432 0.3941 0.6935 3275 146.5137 22.3495 <.0001

 

 

So, in this simple example, we have estimated summary statistics (the mean and standard deviation vectors) and a covariance matrix of a data set with 4 variables. The covariance matrix provides information about relationships between variables or structure in the data. For example, in the top or Y row of the covariance matrix, the estimates indicate that the covariance between Y and each of the three X variables is significantly different from zero. However, the second or X1 row indicates that the covariance between X1 and X3 is probably zero.

 

You can also estimate a covariance matrix in PROC CORR:

 

proc corr cov;
          var y x1 x2 x3;
run;

 

I've shown you a really simple example, and there are many compelling reasons to use an SEM to estimate a covariance matrix. Testing covariance patterns have been an important topic in multivariate statistical analysis. Traditionally, statisticians derive test statistics separately for different pattern hypotheses (e.g., sphericity, equi-covariance, and so on). By using an SEM approach, all these tests are unified in a single SEM framework so that all you need to do is to specify the covariance pattern you want to test using software that supports direct covariance structure modeling, such as the MSTRUCT language in PROC CALIS!

 

Want to see some more examples of testing covariance pattern hypotheses? Check out this article by our brilliant PROC CALIS developer: 

Yung, Y.-F., Browne, M. W., & Zhang, W. (2015). Fitting direct covariance structures by the MSTRUCT modeling language of the CALIS procedure. British Journal of Mathematical and Statistical Psychology, 68, 178-193.

 

We’ll see lots of other SEM examples in the next few posts. In the meantime, have some fun playing with the code above, and stay tuned for more.

 

Next time, we will extend the SEM specification to linear regression and talk about diagrams.

Version history
Last update:
‎08-30-2023 02:23 PM
Updated by:
Contributors

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags