## proc mi for PCA

Hi all,

I have 1000 variables and want to use principle component analysis (PCA). I am trying to use the proc mi for multiple imputations and proc mianalyze for combining the multiple outputs. I have a hard time to figure it out. I just did a single imputation for principle component analysis. Is there any way to combine the results if you want to do nimpute=10 for proc MI? Here is a sample sas data set. Thank you for your help!!!

``````data test;
input id \$ x1 x2 x3 x4 x5 x6;
datalines;
1 1	10	.	8	1	0
2 2	15	.	5	0	1
3 3	22	.	13	1	1
4 4	22	3	6	0	1
5 5	11	8	3	.	0
6 1	10	5	2	.	0
7 2	10	1	3	.	0
8 3	15	6	.	.	1
9 4	13	5	.	.	.
10 5	20	7	.	1	.
11 1	5	1	2	0	.
12 2	9	2	12	1	.
13 3	.	3	16	0	.
14 4	.	6	5	1	0
15 5	.	3	12	0	1
16 6	.	1	14	1	1
17 7	.	.	48	0	1

;

proc print data=test;
run;

proc mi data=test nimpute=0 out=test1;
run;

proc mi data=test nimpute=1 seed=12345 out=impute_test;
class x5 x6;
var x1 x2 x3 x4 x5 x6;
fcs logistic (x5=x2 x3 x4);
fcs logistic (x6=x2 x3 x4);
run;

proc print data=impute_test;
run;

proc distance data=impute_test method=euclid out=std_test;
var interval (x1-x4/ std=std);
var nominal (x5 x6 /std=std);
id id;
run;

proc print data=std_test;
run;

proc princomp data=impute_test out=std_test;
var x1 x2 x3 x4 x5 x6;
run;``````

Thanks,

Bikash

## Re: proc mi for PCA

The problem with doing missing value imputation for PCA is that if the imputation does not take into account the correlation between the variables, then essentially the imputation is going to alter the fitted PCA model. Your code does take this partially into account using FCS LOGISTIC for two variables, but does not take this into account for the correlations between the other variables (like correlation between x2 and x3).

So, I don't really know how to do this using PROC MI. My suggestion is to use PROC PLS where you set the option MISSING=EM (an expectation maximization algorithm is used to replace missing values). To use PROC PLS to get PCA results, the trick is that you have to specify that the x-variables in the PLS model are identical to the y-variables in the PLS model.

But I don't know how well this will work on your data. In addition, you have lots of missing data here, over 20%, and so I would worry that the amount of missing is a problem.

--
Paige Miller
Discussion stats