Re: SEM Polychoric Transformation and Exploratory Factor Analysis

David17 · Posted 05-05-2023 04:27 PM

I'm a little new to Structural Equation Modeling (SEM). I am working with a socio-behavioral survey with two latent predictor variables and one latent outcome variable with 3-4 questions each. Almost all the questions are on a 4- or 5-level Likert scale. My questions: 1) It seems like a polychoric transformation of each question would be useful prior to doing Exploratory Factor Analysis (EFA). This would allow us to "de-emphasize" strongly agree and agree, if there's not much difference between those two answers for an individual question, for example. However, several guides I've read seem to skip the polychoric correlation step, even with ordinal variables. Is doing the polychoric transformation first useful, or is it mostly redundant with EFA, mathematically? 2) Are proc prinqual and proc factor the best procedures to use for this, respectively, or would proc calis (or something else) be better? I'm using SAS 9.4. 3) I also have several demographic variables (age, race, location, background) which likely are associated with both the predictor and outcome latent variables. Our primary interest is what factors affect the outcome, but our secondary interest is if the demographic variables also affect the latent predictors. Is it okay to do this in two steps? First, proc GLMselect on the latent outcome followed by looking at the effect of the demographic variables on the latent predictors (separately). Or, is there a way (and is it better) to model this simultaneously? If the latter, how?

awesome_opossum · Posted 05-08-2023 12:23 PM

I would not recommend using polychoric correlations in SEM. It's not that it's principally impossible, but the issue is really that it vastly complicates interpretation. Tetrachoric (binary variable) correlations can work, although even then I would encourage some degree of extra caution about interpretation.

As well, there is limited SEM functionality regarding this in SAS (and pretty much all other statistical software). As far as I know, the only way to do it in SAS to derive poly/tetrachoric correlations out of proc corr, and then use the correlation matrix as input in proc factor. This will allow you to do exploratory factor analysis, although because the input is a correlation matrix, rather than the actual observations, you cannot derive factor scores.

I believe the same approach can be used in proc calis with lineqs, but I have not tried it myself. Then, you could confirm your factor structure(s) and also enter endogenous and exogenous variables as you please. However and again, that is exactly where the interpretation challenge comes in. Even if you had acceptable fit features on your confirmatory factors, what do the factors mean, and are you confident enough in that meaning that it is justifiable to use it as either a predictor or outcome?

While I understand some people's disdain for using Likert style items as continuous variables, I would argue CFA and SEM are one specific case where the benefits of treating them as continuous greatly and undeniably outweigh the limitations.

David17 · Posted 05-09-2023 01:23 PM

Thanks for your help. I'm a little confused, because you say on the one hand that a polychoric transformation would confuse the interpretation, but on the other hand that treating a Likert scale as a continuous variable in EFA/CFA/SEM is useful. There is a SAS procedure which does polychoric transformations of the answers simply, in addition to providing the polychoric correlation matrix:

proc prinqual data=op out=op_prinqual3 plot=all

maxiter = 100 standard scores n=3 replace;

transform monotone (Dis1-Dis2 Know1-Know4 Yrs4gp OpFreq Stigma1-Stigma8);

* Maxiter: maximum iternations (default=30);

* standard: Standardize output to Variance = 1 N=3 means make 3 axes;

* replace: Replace original values;

* scores: outputs principal component scores;

* Transform monotone for ordinal data; * Transform opscore for nominal data;

run;

So this transforms the Likert answers from the flat 1, 2, 3, 4, 5 to 1.12, 2.28, 2.78, 4.51, 4.94 for example, with different numbers for each question. It's not the correlation matrix, which is also produced. Then, I would input the transformed values into the EFA - something along the lines of this:

proc factor data=op_p3 method=ml rotate=promax corr msa scree residuals preplot plot;

var DisT1-DisT2 KnowT1-KnowT4 Yrs4gpT OpFreqT StigmaT1-StigmaT8;

* DisT etc. are the new variables and answers from the polychoric transformation;

run;

From here, assuming they're interretable and grouping as expected, the resultant factors would be used in a General Linear Model analysis. This still allows factor scores to be produced, and I don't think it messes up interpretation at all. Obviously, interpretation is everything for us. Do you think this resolve the interpretation (and other) issues?

awesome_opossum · Posted 05-09-2023 01:36 PM

Indeed, I could see that working.

Keep in mind you need the priors=smc option in proc factor to make it a factor analysis; otherwise the default is principal component.