Statistical Procedures

juliagscott · Posted 07-04-2020 08:30 PM

Hello!

I have run a principal components analysis of various wealth indicator variables such as 'owning a radio', 'type of toilet used by household', etc with the goal of reducing the large amount of variables. I used proc factor with the PCA method which gave me 9 factors with an Eigenvalue >1. My question is, how do I go from 9 factors to one wealth indicator variable? I would also need to divide the wealth indicator variable into quintiles, indicating household wealth based on the quintile the household belongs to. I am still learning SAS, so any advice would be helpful! Here is my code for reference:

proc factor data = bedlib.prinanalysis outstat = bedlib.prinfactor
simple scree corr score nfact = 9 method = principal;
var PIPED_WATER UNDER_WATER OWNED_TOILET OWNED_PIT SHARED_PIT ELECT_ENERGY
COAL_ENERGY WOOD_ENERGY GAS_ENERGY COWS_ MULES_ GOATS_ PIGS_ CHICKENS_ ANIMALCA_ BICYCLE_ CAR_ MOTORCYCLE_ RADIO_ ELECTRIC_ TELEVISI_ REFRIGER_ CELLPHON_ SOLARPAN_ COMPUTER_ STEREO_;
run;

PGStats · Posted 07-04-2020 10:56 PM

PCA will only identify which combinations of variables account for the greatest variability in your data. How will you interpret a factor containing 0.1 * <number of cows> - 0.02 <number of goats> ?

I think you would get a far better indicator of wealth by multiplying each variable with the approximate value of the feature that it represents, such as <number of televisions> * <average value of a television>. I understand that some of these values might be difficult to estimate, but I would suggest to omit those from your indicator as most of them must be correlated anyway.

Such an indicator would be much easier to interpret and justify.

I hope this helps.

PG

PaigeMiller · Posted 07-05-2020 06:41 AM

So you have reduced many original variables to 9 PCA variables, and the question is how to reduce this to one index? Well, as far as I know, there's no universal method or formula that gets you to the next step. Furthermore, the sign on the PCA vectors is arbitrary, and has no real meaning, so PCA vector 1 could be positive to indicate a high wealth index, or it could be negative to indicate a high wealth index, and there's no way in advance to know which it is. Or it could be that the principal components are completely or nearly completely unrelated to wealth; they don't necessarily have to have anything to do with wealth.

Typically, you might want to try to interpret the PCA vectors, gain some understanding of what vector 1 is telling you, and what vector 2 is telling you, etc. by looking to see which variables have the high loadings (either positive or negative) in a given dimension. For example, if dimension 1 has a lot of variables that have big loadings (either positive or negative) that are associated with education, then you could interpret dimension 1 as an education variable. But even after that, how to get a wealth indicator is not clear.

If you want a predictive measure of wealth, then something like Partial Least Squares regression would be useful, its analogous to PCA but tries to find dimensions/vectors that are predictive. This assumes you have a Y variable to predict.\

I would also need to divide the wealth indicator variable into quintiles, indicating household wealth based on the quintile the household belongs to.

PROC RANK with the option GROUPS=5

--
Paige Miller

Statistical Procedures

How to use Principal Components Analysis results to form a Wealth Indicator variable with quintiles?

Re: How to use Principal Components Analysis results to form a Wealth Indicator variable with quinti

Re: How to use Principal Components Analysis results to form a Wealth Indicator variable with quinti

Principal Component Analysis Results difference

Principal Component Analysis Output Question

Principal components analysis for complex survey data

Principal Components Regression

PCA (Principal component analysis) in pharma

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...