BookmarkSubscribeRSS Feed
ksmielitz
Quartz | Level 8

Dear All, 

 

It is entirely possible that I am thinking too much about this (I'm hoping)...but here's the DL.

 

I have conducted multiple factor analyses. In FA 1 I ran open, likelyafs, likelybudget, likelysave. Open and LIkelyAFS very nicely loaded together (expected) as did likelybudget and likelysave (also expected). So, moving on through my data and coding I also needed to see how totaltime, numincar, and offense loaded so I could determine if I have to leave them as separate predictors or if I could create a factor score. Totaltime and numincar very nicely loaded together and offense is hanging out there in the wind (that's fine).

 

So far, life is good.

 

I created the factor scores for open/likelyafs (bankincl) and likelybudget/likelysave (finmanage). Great...they work well! These are my DVs and I'm cruising along.

 

I'm working with a smaller sample (ah the joys of primary data) so I'm trying to keep my predictors to a minimum and said "AH! Since totaltime and numincar loaded so well together (.80, .82) on the same factor, I'll create a factor score for those as well." So, I did the exact same coding procedure I did for my DVs. But, when I run both proc factor/proc score combos (one for my DVs and one for my Aspects of Incarceration), my regressions won't run. HOWEVER, when I load them all into the mix together, the regressions run. This is what I have: 

 

proc factor data=dissert score outstat=factout nfactors=3
		method=prin rotate=varimax score;
	var open likelyafs likelybudget likelysave totaltime numincar;
run;

proc score data=dissert score=factout out=fscore
	(rename= (factor1=bankincl factor2=finmanage factor3=AspectIncar));
	var open likelyafs likelybudget likelysave totaltime numincar;
run;

FYI (a little more back story): For the sake of trying to figure things out, I ran a proc factor with everything together:

 

proc factor rotate=varimax ev scree min=1;
var totaltime numincar open likelyafs likelybudget likelysave;
run;

And boy howdy did the factors show up whacked. I was *hoping* that they would separate, but alas, we can't have everything can we? So, I deleted that code and said, "Well, I have 3 clearly defined factors when I run the proc factor by the general subjects (DVs and aspects of incarceration), so I'll use those and load them into the proc factor/proc score to get everything labeled since trying to score and label everything separately blocks my DV factor scores from running (i.e. SAS says they don't exist)."

 

i.e. when I do the following my DVs "do not exist":

 

proc factor data=dissert score outstat=factout nfactors=1
		method=prin score;
	var totaltime numincar;
run;

proc score data=dissert score=factout out=fscore
	(rename= (factor1=AspectIncar));
	var totaltime numincar;
run;

proc factor data=dissert score outstat=factout nfactors=2
		method=prin rotate=varimax score;
	var open likelyafs likelybudget likelysave;
run;

proc score data=dissert score=factout out=fscore
	(rename= (factor1=bankincl factor2=finmanage ));
	var open likelyafs likelybudget likelysave ;
run;

proc reg;
model bankincl=age finknow aspectincar;
run;

proc reg;
model finmanage=age finknow aspectincar;
run;

 

When I run my regressions with the proc factor/proc score in the first example of this novel:

 

proc reg;
model bankincl=age finknow aspectincar;
run;

proc reg;
model finmanage=age finknow aspectincar;
run;

 They run...but since this is my dissertation I have a particularly vested interest in making sure that there is nothing wrong with what I've got so far.

 

With all of that information, it boils down to: Is there anything wrong with loading all of my factors into one scoring mechanism? I think there isn't because my DVs are separate factors but I'm *minorly* freaking out. 🙂

 

Thanks for sticking with me through the novel...maybe my dissertation will be as long. 😉

Kate

3 REPLIES 3
ksmielitz
Quartz | Level 8

For the record, in my actual code "open" does not turn blue, it's a defined variable. I don't know why it's doing it in my examples.

ksmielitz
Quartz | Level 8

As of this morning (I was checking some other code and forgot to hide the proc factor and the proc score) IT WORKED!!

 

Rotated Factor Pattern

 

Factor1

Factor2

Factor3

Open

0.22955

-0.04384

0.81662

LikelyAFS

-0.26342

0.24516

0.62108

LikelyBudget

0.81318

-0.01412

0.13611

LikelySave

0.83163

0.07836

-0.11512

TotalTime

0.04161

0.76286

0.16007

NumIncar

0.01583

0.83121

-0.01679

 

PaigeMiller
Diamond | Level 26

@ksmielitz wrote:

 

 

proc factor data=dissert score outstat=factout nfactors=1
		method=prin score;
	var totaltime numincar;
run;

proc score data=dissert score=factout out=fscore
	(rename= (factor1=AspectIncar));
	var totaltime numincar;
run;

proc factor data=dissert score outstat=factout nfactors=2
		method=prin rotate=varimax score;
	var open likelyafs likelybudget likelysave;
run;

proc score data=dissert score=factout out=fscore
	(rename= (factor1=bankincl factor2=finmanage ));
	var open likelyafs likelybudget likelysave ;
run;

proc reg;
model bankincl=age finknow aspectincar;
run;

proc reg;
model finmanage=age finknow aspectincar;
run;

 

When I run my regressions with the proc factor/proc score in the first example of this novel:

 

proc reg;
model bankincl=age finknow aspectincar;
run;

proc reg;
model finmanage=age finknow aspectincar;
run;

 They run...but since this is my dissertation I have a particularly vested interest in making sure that there is nothing wrong with what I've got so far.

 

My problem with this approach is that running principal components followed by a regression likely will not give you the best predictive model. The principal components that you find are created without using any knowledge of the independent variables, age finknow aspectincar. And so it is entirely possible that the components found by PCA are not well predicted by age finknow aspectincar.

 

It seems to me that a better approach is to use a method that finds the components of open likelyafs likelybudget likelysave that are well predicted (as well as the data will allow) by age finknow aspectincar. Such a method will produce higher R-squared than the method you have chosen. Of course, you will get different components, and different scores than PCA. The method is Partial Least Squares regression (in SAS, it is PROC PLS).

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1307 views
  • 1 like
  • 2 in conversation