turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- storing factor scores for use in regression

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 01:01 PM

Okay, so I ran a proc factor and it turns out I have 2 factors...fine, great.

I then ran the proc score to create scores so that I could use them in my regression statements...one of the factors I could actually handle separately, but my SES variable has 2 continuous variables (hhincome, ageofmom), 1 binary (banked), and 1 categorical (highest)...and I need to use the SES factor as a predictor of my DVs.

So the code that works (including the proc factor that determined I had 2 factors, not just the one I thought):

```
proc factor rotate=varimax ev scree min=1;
var welfare ssi foodstamps banked ageofmom hhincome highest;
run;
proc factor data=nlsyproject outstat=Factout
method=prin rotate=varimax score;
var banked ageofmom hhincome highest;
title 'SESFactor Scoring';
run;
proc score data=nlsyproject score=factout out=Fscore;
var banked ageofmom hhincome highest;
run;
proc factor data=nlsyproject outstat=Factout
method=prin rotateo=varimax score;
var welfare foodstamps ssi;
title 'GovtAssist Scoring';
run;
proc score data=nlsyproject score=factout out=Fscore;
var welfare foodstamps ssi;
run;
```

So, how do I use 'SESFactor' as a predictor in my regression statements?

I need to have some way to store the result and label it so I have some conciseness.

The GovtAssist factor I figure I can create a 0-3 scale for range of use 0=no assistance through 3=all types and then keep plugging along, but I have NO idea how to handle SES if not for this factor scoring thing that I can't seem to save the resulting score for use in the model.

Thanks in advance for any help.

Accepted Solutions

Solution

07-15-2016
04:35 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 03:29 PM

Proc Score creates the factor columns for you, but it is up to you to provide nicer names like SES. As mentioned earlier, a single run of FACTOR and SCORE should suffice.

Here's a really simple example. You will need to modify it to use your own variables, factor names, and MODEL statement. My code is just for illustration.

```
proc factor data=sashelp.iris outstat=Factout nfactors=2
method=prin rotate=varimax score;
var petallength petalwidth sepallength sepalwidth;
run;
proc score data=sashelp.iris score=factout out=Fscore (rename=(factor1=someNewNameLikeSES factor2=someOtherNewName);
var petallength petalwidth sepallength sepalwidth;
run;
proc reg data = Fscore ;
model someNewNameLikeSES = someOtherNewName;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 02:33 PM - edited 07-15-2016 02:33 PM

Hi,

Why are you running two separate factor analysis? Can't it be done in a single run?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 02:49 PM

@stat_sas Yes, it can be done in one factor analysis, but that confused me even more. I need to ensure that I have two separate predictors defined (govtassist and SES). I don't really care how they're titled, but I can't figure out how to run the variables that construct my SES factor as one unit. Is this possible? I figured out a way to make govtassist by redefining my variable. But I don't know how to combine 3 continuous and 1 binary variable and label the resulting factor...to then use that factor as a predictor in my regressions/other statistical tests.

I know that hhincome, ageofmom, banked, and highest construct the SES variable but if I run:

proc reg;

model behavior=hhincome ageofmom banked highest;

run;

is it recognizing those 4 variables as my SES factor? And then when I add in male (to control for gender) black hispanic (compared to whites) will my results be different if I have a defined SES factor than if I have those 4 variables separately listed?

Ack!

Thanks in advance for any additional thoughts and/or feedback.

K8

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 03:02 PM

Hi,

What do you mean by

"how to combine 3 continuous and 1 binary variable and label the resulting factor"?

Are hhincome ageofmom banked highest uncorrelated?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 03:05 PM - edited 07-15-2016 03:07 PM

@stat_sas You'll have to forgive me because I may not be explaining this correctly.

Yes, hhincome ageofmom banked and highest are correlated.

My concern is whether or not I have to combine them into one specifically defined factor variable to represent all of them in the regression.

So, I could run

proc reg;

model behavior= hhincome ageofmom banked highest

OR (if possible)

proc reg;

model behavior= SES (which encompasses the 4 listed above).

Is there a way to do that or am I just over-complicating this?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 03:10 PM

Take a look at PROC PLS--an excellent way of modeling multiple responses based on multiple predictive variables. Just a different way of trying to solve a similar problem.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 03:11 PM

I'll take a look, @SteveDenham. Thanks!

Solution

07-15-2016
04:35 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 03:29 PM

Proc Score creates the factor columns for you, but it is up to you to provide nicer names like SES. As mentioned earlier, a single run of FACTOR and SCORE should suffice.

Here's a really simple example. You will need to modify it to use your own variables, factor names, and MODEL statement. My code is just for illustration.

```
proc factor data=sashelp.iris outstat=Factout nfactors=2
method=prin rotate=varimax score;
var petallength petalwidth sepallength sepalwidth;
run;
proc score data=sashelp.iris score=factout out=Fscore (rename=(factor1=someNewNameLikeSES factor2=someOtherNewName);
var petallength petalwidth sepallength sepalwidth;
run;
proc reg data = Fscore ;
model someNewNameLikeSES = someOtherNewName;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 04:08 PM

@rayIII Okay, I'm feeling good about this, but it won't let me rename them. When I type in "rename" like you did, it doesn't turn blue.

And am I reading your last section of code correctly that I would set my factors equal to one another in the regression statement? Why would I do that?

Thanks, K8

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 04:33 PM - edited 07-15-2016 04:35 PM

I GOT IT!! I GOT IT!! I GOT IT!!!!

I used the factors to predict my DV and it worked!!!!

THANK YOU!!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 04:35 PM

Glad you got it! Sorry, I left off a parenthesis. It should have been:

proc score data=sashelp.iris score=factout out=Fscore (rename=(factor1=someNewNameLikeSES factor2=someOtherNewName)**)**;

var petallength petalwidth sepallength sepalwidth;

run;

Also, my proc reg call was just an example to show how it to use scored data in a regression analysis. You would use your own Model statement like:

model behavior =someNewNameLikeSES someOtherNewName;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-15-2016 04:29 PM - edited 07-15-2016 04:30 PM

Hi,

If variables are highly correlated then first step after running factor analysis is to see eigen values which represent

variation explained by factors. As an example, if first eigen value explains 80% of the variation then first factor

score would be sufficient for subsequent analysis and this can also be used as a representative of four variables.

On the other hand, if more top two eigen values significantly expains most of the variation in orginal variables then you have to

use two factor scores in regression analysis. Lastly, running separate factor analysis may produce factor scores which may

be again correlated and can introduce overfitting and destablize parameter estimates.