BookmarkSubscribeRSS Feed
bikashten
Fluorite | Level 6

Hi all, 

I have 1000 variables and want to use principle component analysis (PCA). I am trying to use the proc mi for multiple imputations and proc mianalyze for combining the multiple outputs. I have a hard time to figure it out. I just did a single imputation for principle component analysis. Is there any way to combine the results if you want to do nimpute=10 for proc MI? Here is a sample sas data set. Thank you for your help!!!

 

data test;
input id $ x1 x2 x3 x4 x5 x6;
datalines;
1 1	10	.	8	1	0
2 2	15	.	5	0	1
3 3	22	.	13	1	1
4 4	22	3	6	0	1
5 5	11	8	3	.	0
6 1	10	5	2	.	0
7 2	10	1	3	.	0
8 3	15	6	.	.	1
9 4	13	5	.	.	.
10 5	20	7	.	1	.
11 1	5	1	2	0	.
12 2	9	2	12	1	.
13 3	.	3	16	0	.
14 4	.	6	5	1	0
15 5	.	3	12	0	1
16 6	.	1	14	1	1
17 7	.	.	48	0	1

;

proc print data=test;
run;

proc mi data=test nimpute=0 out=test1;
run;

proc mi data=test nimpute=1 seed=12345 out=impute_test;
	class x5 x6;
	var x1 x2 x3 x4 x5 x6;
	fcs logistic (x5=x2 x3 x4);
	fcs logistic (x6=x2 x3 x4);
run;

proc print data=impute_test;
run;

proc distance data=impute_test method=euclid out=std_test;
	var interval (x1-x4/ std=std);
	var nominal (x5 x6 /std=std);
	id id;
run;

proc print data=std_test;
run;

proc princomp data=impute_test out=std_test;
	var x1 x2 x3 x4 x5 x6;
run;

Thanks,

Bikash

1 REPLY 1
PaigeMiller
Diamond | Level 26

The problem with doing missing value imputation for PCA is that if the imputation does not take into account the correlation between the variables, then essentially the imputation is going to alter the fitted PCA model. Your code does take this partially into account using FCS LOGISTIC for two variables, but does not take this into account for the correlations between the other variables (like correlation between x2 and x3).

 

So, I don't really know how to do this using PROC MI. My suggestion is to use PROC PLS where you set the option MISSING=EM (an expectation maximization algorithm is used to replace missing values). To use PROC PLS to get PCA results, the trick is that you have to specify that the x-variables in the PLS model are identical to the y-variables in the PLS model.

 

But I don't know how well this will work on your data. In addition, you have lots of missing data here, over 20%, and so I would worry that the amount of missing is a problem.

--
Paige Miller

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 300 views
  • 0 likes
  • 2 in conversation