Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
juliagscott
Calcite | Level 5

Hello!

I have run a principal components analysis of various wealth indicator variables such as 'owning a radio', 'type of toilet used by household', etc with the goal of reducing the large amount of variables. I used proc factor with the PCA method which gave me 9 factors with an Eigenvalue >1. My question is, how do I go from 9 factors to one wealth indicator variable? I would also need to divide the wealth indicator variable into quintiles, indicating household wealth based on the quintile the household belongs to. I am still learning SAS, so any advice would be helpful! Here is my code for reference:

 

proc factor data = bedlib.prinanalysis outstat = bedlib.prinfactor
simple scree corr score nfact = 9 method = principal;
var PIPED_WATER UNDER_WATER OWNED_TOILET OWNED_PIT SHARED_PIT ELECT_ENERGY
COAL_ENERGY WOOD_ENERGY GAS_ENERGY COWS_ MULES_ GOATS_ PIGS_ CHICKENS_ ANIMALCA_ BICYCLE_ CAR_  MOTORCYCLE_ RADIO_ ELECTRIC_ TELEVISI_ REFRIGER_ CELLPHON_ SOLARPAN_ COMPUTER_ STEREO_;
run;



 

2 REPLIES 2
PGStats
Opal | Level 21

PCA will only identify which combinations of variables account for the greatest variability in your data. How will you interpret a factor containing 0.1 * <number of cows> - 0.02 <number of goats>

 

I think you would get a far better indicator of wealth by multiplying each variable with the approximate value of the feature that it represents, such as <number of televisions> * <average value of a television>. I understand that some of these values might be difficult to estimate, but I would suggest to omit those from your indicator as most of them must be correlated anyway.

 

Such an indicator would be much easier to interpret and justify.

 

I hope this helps.

PG
PaigeMiller
Diamond | Level 26

So you have reduced many original variables to 9 PCA variables, and the question is how to reduce this to one index? Well, as far as I know, there's no universal method or formula that gets you to the next step. Furthermore, the sign on the PCA vectors is arbitrary, and has no real meaning, so PCA vector 1 could be positive to indicate a high wealth index, or it could be negative to indicate a high wealth index, and there's no way in advance to know which it is. Or it could be that the principal components are completely or nearly completely unrelated to wealth; they don't necessarily have to have anything to do with wealth.

 

Typically, you might want to try to interpret the PCA vectors, gain some understanding of what vector 1 is telling you, and what vector 2 is telling you, etc. by looking to see which variables have the high loadings (either positive or negative) in a given dimension. For example, if dimension 1 has a lot of variables that have big loadings (either positive or negative) that are associated with education, then you could interpret dimension 1 as an education variable. But even after that, how to get a wealth indicator is not clear.

 

If you want a predictive measure of wealth, then something like Partial Least Squares regression would be useful, its analogous to PCA but tries to find dimensions/vectors that are predictive. This assumes you have a Y variable to predict.\

 

I would also need to divide the wealth indicator variable into quintiles, indicating household wealth based on the quintile the household belongs to.

 

PROC RANK with the option GROUPS=5

 

--
Paige Miller

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1079 views
  • 0 likes
  • 3 in conversation