A Gentle Introduction to Structural Equation Models (SEM), Part 3: Measuring Latent Variables with Confirmatory Factor Analysis: The purpose of this post is to show you a simple example of confirmatory factor analysis in PROC CALIS. This is the third of a multi-part series about Structural Equation Modeling in SAS with PROC CALIS.
Full SEMs can be comprised of observed variables, latent variables, and error terms whose relationships are characterized by direct and indirect paths, means, covariances, equality constraints, and more. So far in this series, we have looked at SEM as a covariance matrix, as a linear regression, and as regression with parameter constraints. SEMs are especially powerful for the ability to estimate latent variables, and that’s what we will do today.
Factor analysis is a class of methods for establishing the measurement of one or more latent variables. There are 2 very common groups of methods: exploratory factor analysis and confirmatory factor analysis. Exploratory factor analysis (EFA) is mathematically related to principal components analysis (PCA), and usually entails eigenvalue decomposition of a (weighted, reduced) correlation matrix. Some number of factors is determined from examination of this preliminary decomposition, and then interpretation of the factors comes from applying a rotation technique, such as Varimax, Promax, etc. Marc Huber has written about the similarities between EFA and regression here, and in more detail about EFA here.
Confirmatory factor analysis (CFA) differs from EFA in that CFA is a structural equation model that uses maximum likelihood estimation to approximate the measurement of latent variables with specific constraints on the model. Unlike EFA, CFA enables you to test hypotheses about the measurement model such as whether some number of factors, correlation among factors, or the presence of specific factor indicators is a better fit to the data than an alternative model. In case it helps, here’s a quick review of some terminology for latent variable analysis: latent, manifest, and indicator variables.
Latent variables are constructs that you cannot directly measure, but that are hypothesized to exist. Examples from the social and behavioral sciences are plentiful and can include intelligence, risk-aversiveness, extraversion, conscientiousness, economic health, spending power, consumer confidence, customer satisfaction, and much more. For illustration purposes, examples from social and behavioral sciences tend to be understandable by most readers. Let’s take Parental Support of Math Education [Figure 1] as an example.
[1]
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
It is not possible to directly measure parental support, but you can estimate it by looking at specific behaviors, asking self-report questions of students and of their parents, and so on. In a path diagram, it’s customary to show latent variables in oval shapes. Manifest variables are variables that you can observe, measure, and analyze directly. Gender, age, annual salary, and SAT composite score are all manifest variables. In a diagram, manifest variables are typically shown as rectangles. [Figures 2 and 3]
[2, 3]
Indicator variables are a particular type of manifest variable. Indicators indicate the presence of a latent variable. In the case of parental support of math education, a battery of self-report questions asking for students’ perception of their parents’ support behaviors are indicators. Latent variables are imperfectly measured by indicators, and in SEM, it is important that latent variables have 2 or more (preferably more than two) indicators to improve measurement. Here is a measurement model diagram with five indicators: [Figure 4]
[4]
It is more interesting to look at 2 factors than just one, so let’s add a second factor-- the student’s perception of the importance of mathematical skills: [Figure 5]
[5]
First thing you might notice is the direction of the arrows. They point from the factors to the indicators. This seems counterintuitive at first, but it is hypothesized this way because the theoretical predictive direction is from the factor to the manifest variable. An indicator indicates the presence of a factor, not the other way around. There are errors for each indicator. These capture the variation in the indicator that is not caused by the factor. If you hypothesize that the factors are correlated with each other, the diagram can show this with a double-headed arrow between factors. This represents a hypothesis that the factors are correlated.
Here is PROC CALIS syntax for fitting the model:
proc calis data =stem;
path
math_imp ---> I1 I2 I3 I4 I5,
parents ---> p1 p2 p3 p4 p5;
pvar
math_imp = 1,
parents = 1;
run;
The PVAR statement sets the factor variances to a constant of 1.
The PATHDIAGRAM statement has lots of options to customize the look and richness of information shown. For this model, if we want to see the estimated parameters and flag those that are statistically different from 0, we can add this statement to PROC CALIS:
pathdiagram exogcov
label=[math_imp="Importance of Math"
parents="Parental Supp. of Math Ed."];
The EXOGCOV option displays covariances between exogenous variables on the diagram. The LABEL= option applies a label to the latent variables for ease of interpretation. [Figure 6]
[6]
The estimated covariance between the factors is 0.35. Because the factors are scaled to have a variance of 1, this can also be interpreted as the estimated correlation between factors. The flag with 2 asterisks shows that this covariance, and all other parameters estimated in the model, are significantly different from 0 at alpha = 0.01.
How about overall model fit? How does this compare with an alternative model? There are lots of ways to assess a model, and that will be the topic of the next post in this series. Until then, I hope you try out some CFAs in PROC CALIS! Have fun!
Find more articles from SAS Global Enablement and Learning here.
... View more