This is the first in a multi-part series on structural equation models, or SEMs. In this blog, you will learn to estimate a mean vector and a covariance matrix from data using the CALIS procedure.
A structural equation model (SEM) is a modeling technique for explaining and testing hypotheses about complex relationships among variables (observed and unobserved) that make up a system or phenomenon. The models can cite their lineage back to psychometrics, econometrics, and biometrics, and are especially interesting for directly testing a complex hypothesis of interest in one go.
We can make comparisons between SEMs and linear models. The SEM represents your hypothesis, just as a linear model does. However, there are some key differences.
We will see some comparisons of SEM to linear models in a future blog post. Today, let’s discuss a very simple application of SEM. In this series, I will be using simulated data that you can play with, too. See the end of the blog for the DATA step code.
One of the simplest examples of SEM is estimating a variance-covariance matrix for your data. There’s very simple code to do this in PROC CALIS:
proc calis;
mstruct var=y x1 x2 x3;
run;
The MSTRUCT statement is one of several different methods for specifying a model in PROC CALIS. The MSTRUCT statement enables you to directly specify the covariance matrices for your hypothesized model and is well-suited for situations where thinking about the model in terms of matrices is the most straightforward approach.
Here are the results for the estimated covariance matrix (partial output):
Simple Statistics | ||
Variable | Mean | Std Dev |
x1 | -0.16467 | 25.63234 |
x2 | 0.07707 | 40.89412 |
X3 | 1.86254 | 57.22332 |
y | 16.06475 | 56.62608 |
MSTRUCT _COV_ Matrix: Estimate/StdErr/t-value/p-value | ||||
y | x1 | x2 | X3 | |
y | 3207 143.4714 22.3495 <.0001 | 687.6537 50.8152 13.5324 <.0001 | 1683 90.5659 18.5803 <.0001 | 2017 120.7609 16.7036 <.0001 |
x1 | 687.6537 50.8152 13.5324 <.0001 | 657.0167 29.3974 22.3495 <.0001 | 228.2159 33.9409 6.7239 <.0001 | 37.4505 46.4216 0.8067 0.4198 |
x2 | 1683 90.5659 18.5803 <.0001 | 228.2159 33.9409 6.7239 <.0001 | 1672 74.8262 22.3495 <.0001 | 29.1822 74.0432 0.3941 0.6935 |
X3 | 2017 120.7609 16.7036 <.0001 | 37.4505 46.4216 0.8067 0.4198 | 29.1822 74.0432 0.3941 0.6935 | 3275 146.5137 22.3495 <.0001 |
So, in this simple example, we have estimated summary statistics (the mean and standard deviation vectors) and a covariance matrix of a data set with 4 variables. The covariance matrix provides information about relationships between variables or structure in the data. For example, in the top or Y row of the covariance matrix, the estimates indicate that the covariance between Y and each of the three X variables is significantly different from zero. However, the second or X1 row indicates that the covariance between X1 and X3 is probably zero.
You can also estimate a covariance matrix in PROC CORR:
proc corr cov;
var y x1 x2 x3;
run;
I've shown you a really simple example, and there are many compelling reasons to use an SEM to estimate a covariance matrix. Testing covariance patterns have been an important topic in multivariate statistical analysis. Traditionally, statisticians derive test statistics separately for different pattern hypotheses (e.g., sphericity, equi-covariance, and so on). By using an SEM approach, all these tests are unified in a single SEM framework so that all you need to do is to specify the covariance pattern you want to test using software that supports direct covariance structure modeling, such as the MSTRUCT language in PROC CALIS!
Want to see some more examples of testing covariance pattern hypotheses? Check out this article by our brilliant PROC CALIS developer:
Yung, Y.-F., Browne, M. W., & Zhang, W. (2015). Fitting direct covariance structures by the MSTRUCT modeling language of the CALIS procedure. British Journal of Mathematical and Statistical Psychology, 68, 178-193.
We’ll see lots of other SEM examples in the next few posts. In the meantime, have some fun playing with the code above, and stay tuned for more.
Next time, we will extend the SEM specification to linear regression and talk about diagrams.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.