- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I have run a principal components analysis of various wealth indicator variables such as 'owning a radio', 'type of toilet used by household', etc with the goal of reducing the large amount of variables. I used proc factor with the PCA method which gave me 9 factors with an Eigenvalue >1. My question is, how do I go from 9 factors to one wealth indicator variable? I would also need to divide the wealth indicator variable into quintiles, indicating household wealth based on the quintile the household belongs to. I am still learning SAS, so any advice would be helpful! Here is my code for reference:
proc factor data = bedlib.prinanalysis outstat = bedlib.prinfactor
simple scree corr score nfact = 9 method = principal;
var PIPED_WATER UNDER_WATER OWNED_TOILET OWNED_PIT SHARED_PIT ELECT_ENERGY
COAL_ENERGY WOOD_ENERGY GAS_ENERGY COWS_ MULES_ GOATS_ PIGS_ CHICKENS_ ANIMALCA_ BICYCLE_ CAR_ MOTORCYCLE_ RADIO_ ELECTRIC_ TELEVISI_ REFRIGER_ CELLPHON_ SOLARPAN_ COMPUTER_ STEREO_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PCA will only identify which combinations of variables account for the greatest variability in your data. How will you interpret a factor containing 0.1 * <number of cows> - 0.02 <number of goats> ?
I think you would get a far better indicator of wealth by multiplying each variable with the approximate value of the feature that it represents, such as <number of televisions> * <average value of a television>. I understand that some of these values might be difficult to estimate, but I would suggest to omit those from your indicator as most of them must be correlated anyway.
Such an indicator would be much easier to interpret and justify.
I hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you have reduced many original variables to 9 PCA variables, and the question is how to reduce this to one index? Well, as far as I know, there's no universal method or formula that gets you to the next step. Furthermore, the sign on the PCA vectors is arbitrary, and has no real meaning, so PCA vector 1 could be positive to indicate a high wealth index, or it could be negative to indicate a high wealth index, and there's no way in advance to know which it is. Or it could be that the principal components are completely or nearly completely unrelated to wealth; they don't necessarily have to have anything to do with wealth.
Typically, you might want to try to interpret the PCA vectors, gain some understanding of what vector 1 is telling you, and what vector 2 is telling you, etc. by looking to see which variables have the high loadings (either positive or negative) in a given dimension. For example, if dimension 1 has a lot of variables that have big loadings (either positive or negative) that are associated with education, then you could interpret dimension 1 as an education variable. But even after that, how to get a wealth indicator is not clear.
If you want a predictive measure of wealth, then something like Partial Least Squares regression would be useful, its analogous to PCA but tries to find dimensions/vectors that are predictive. This assumes you have a Y variable to predict.\
I would also need to divide the wealth indicator variable into quintiles, indicating household wealth based on the quintile the household belongs to.
PROC RANK with the option GROUPS=5
Paige Miller