BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KR123
Fluorite | Level 6

Hello! I am conducting my first multiple imputation and I have a question regarding the creation of factor scores. 

 

I'm creating factor scores from variables that have missing data but other variables that I will be using also have missing data.

I have been successful in creating factor scores using multiply imputed data in SAS. However, I also know that just because something runs doesn't mean it is correct. Is there a reason why I should not use factor scores generated from data that was imputed? I've been getting mixed messages.

 

If I should not be creating factor scores after imputation, is it reasonable to create the factor score prior to imputation and then impute?

 

I want to use the factor scores as predictors in a multinomial logistic regression model.

 

Thank you in advance for any help! 

1 ACCEPTED SOLUTION

Accepted Solutions
SAS_Rob
SAS Employee
Running a Factor analysis with multiply imputed data is not always accepted because you combine the factor scores using something like MIANALYZE and get a single set of scores (since there are no standard errors). An argument could be made however for the validity of doing something like this:

1. Run Multiple Imputation
2. Develop factor scores for each of the M imputations
3. Run M logit models, using the respective factor scores for each data set as predictors.
4. Combine the estimates from the logit models using Proc MIANALYZE

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Are you imputing scores, or are you imputing the values of the original variables?

--
Paige Miller
KR123
Fluorite | Level 6

I'm currently imputing values of the original variables, then creating the factor scores using the imputed data. 

PaigeMiller
Diamond | Level 26

I guess I would need to understand how you are doing this imputation.

--
Paige Miller
KR123
Fluorite | Level 6

I am imputing all my missing data first.  Then using the data set created by the multiple imputation, I create the factor scores. I've pasted the sequence of my code below: 

 

*Create imputed data set*


proc mi data=l out=iexample nimpute=2 seed=9455;
class f1exp f1rgpp2 mathhisp mathteach f3attain enghisp engteach
mathtext mathtest engtext engmaterial ilearn mathmaterial badgrades engassign engtest engskill mathassign probwrong learnwell mathskill
pareduc male byexp race ;

fcs logistic (f1exp/details);
fcs logistic (f1rgpp2/details);
fcs logistic (mathhisp/details);
fcs logistic (mathteach/details);
fcs logistic (f3attain/details);
fcs logistic (enghisp/details);
fcs logistic (engteach/details);
fcs logistic (mathtext/details) ;
fcs logistic (mathtest/details) ;
fcs logistic (engtext/details) ;
fcs logistic (engmaterial/details) ;
fcs logistic (ilearn/details) ;
fcs logistic (mathmaterial/details);
fcs logistic (badgrades/details);
fcs logistic (engassign/details);
fcs logistic (engtest/details);
fcs logistic (engskill/details);
fcs logistic (mathassign/details);
fcs logistic (probwrong/details);
fcs logistic (learnwell/details);
fcs logistic (mathskill/details);

var f1exp f1rgpp2 mathhisp mathteach f3attain enghisp engteach mathtext mathtest engtext engmaterial ilearn
mathmaterial badgrades engassign engtest engskill mathassign probwrong learnwell mathskill
male byses1 race bytxrirr bytxmirr bytxmstd bytxcstd byexp byincome pareduc ;
run;

 

*run factor analysis on imputed data*


proc factor data=iexample nfactors=2 method=prinit priors= smc fuzz=.3 reorder scree plots=(initloadings scree) out=ifactor_scores1;
by _Imputation_;
var mathtest mathtext mathmaterial mathassign mathskill engtest engtext engmaterial engassign engskill
ilearn badgrades probwrong learnwell;
weight bystuwt;
run;

PaigeMiller
Diamond | Level 26

I have to admit that I am always uncomfortable when a multivariate procedure such as factor analysis or principal components uses imputation in a univariate manner. This could potentially destroy the correlation/covariance of the data. I would be happier to see a multivariate imputation somehow, that makes use of the correlations or covariances between the x-variables, in which case I would recommend using the EM statement and not the FCS statement.

 

I have to admit that I am uncomfortable when a factor analysis is performed when all of the x-variables are class variables. (Well, okay, they are class variables when you impute in PROC MI, but they are not class variables when you do PROC FACTOR? Could you explain that?)

 

I have to admit that I am uncomfortable performing a factor analysis as input to a regression or logistic regression, because the factor analysis does not find factors that have to be good predictors, and so it may (or may not) find factors that don't predict well. To overcome this deficiency, I always recommend Partial Least Squares (PROC PLS), which finds factors that are predictive (as much as the data will allow), and which has a built in EM algorithm to impute missing values. But ... there is an actual deficiency here ... PROC PLS only works on continuous Y and there is no actual PLS version in SAS that handles the logistic case.

 

So I don't really have a suggestion on what to do. Anything that you can do in SAS makes me uncomfortable. But I would definitely switch to the EM imputation of missing values among the (continuous?) x-variables.

--
Paige Miller
KR123
Fluorite | Level 6
Okay, I will do more research taking into consideration your concerns. Yes, they are all class variables, just a mistake when pasting the code on here. Thank you for your input and help!
PaigeMiller
Diamond | Level 26

@KR123 wrote:
Okay, I will do more research taking into consideration your concerns. Yes, they are all class variables, just a mistake when pasting the code on here. Thank you for your input and help!

So what does it mean to do Factor analysis on all class variables? There isn't even a CLASS statement in PROC FACTOR. So I'm still not comfortable with your approach.

--
Paige Miller
SAS_Rob
SAS Employee
Running a Factor analysis with multiply imputed data is not always accepted because you combine the factor scores using something like MIANALYZE and get a single set of scores (since there are no standard errors). An argument could be made however for the validity of doing something like this:

1. Run Multiple Imputation
2. Develop factor scores for each of the M imputations
3. Run M logit models, using the respective factor scores for each data set as predictors.
4. Combine the estimates from the logit models using Proc MIANALYZE

KR123
Fluorite | Level 6
Thank you for your help!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1750 views
  • 1 like
  • 3 in conversation