BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kai_cody
Fluorite | Level 6

I am trying to run a principal component analysis with all binary variables rather than continuous variables. There does not seem to be a consensus online as to whether proc factor or proc princomp will accept binary variables and if yes, whether we need to input the data as a tetrachoric correlation matrix. 

 

Do I need to first create a tetrachoric correlation matrix and then use this matrix dataset as input to run the proc factor or proc princomp? 

 

Or can I just input the raw original dataset into proc factor or proc princomp and it will create a tetrachoric correlation matrix behind the scenes, recognizing that the variables are binary?

 

Any ideas?

 

Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

From what I could gather on the net (where I also found some warnings about the need to smooth the correlation matrix when it is not positive definite), it could be done like this (without the smoothing) :

 

/* Create fake data */
data test;
array x x1-x10;
call streaminit (65858);
do i = 1 to 100;
	do j = 1 to dim(x);
		x{j} = rand("BERNOULLI", 0.2);
		end;
	output;
	end;
drop i j;
run;

/* Compute the correlations */
proc corr data=test polychoric;
var x:;
ods output PolychoricCorr=pc;
run; 

/* Reformat the proc corr output into a correlation matrix */
proc transpose data=pc out=pcCorrTable(drop=_label_);
by var;
id withvar;
var corr;
run;

data pcCorr(type="CORR");
set pcCorrTable end=done;
array x x1-x10;
_NAME_ = var;
x{_n_} = 1;
output;
if done then do;
	call missing(of x{*});
	_NAME_ = "x10";
	x10 = 1;
	output;
	end;
run;

/* Get the first 3 principal components */
proc princomp data=pcCorr n=3;
var x1-x10;
run;
PG

View solution in original post

2 REPLIES 2
PaigeMiller
Diamond | Level 26

The issue in my mind is not whether or not you can force your data through PROC PRINCOMP or PROC FACTOR. Of course you can.

 

The issue is ... does it make even the slightest bit of sense to perform an analysis designed for continuous data on binary data, and my answer is ... it makes no sense at all.

 

You might want to consider a procedure that was designed for binary data, such as Correspondence analysis (PROC CORRESP).

--
Paige Miller
PGStats
Opal | Level 21

From what I could gather on the net (where I also found some warnings about the need to smooth the correlation matrix when it is not positive definite), it could be done like this (without the smoothing) :

 

/* Create fake data */
data test;
array x x1-x10;
call streaminit (65858);
do i = 1 to 100;
	do j = 1 to dim(x);
		x{j} = rand("BERNOULLI", 0.2);
		end;
	output;
	end;
drop i j;
run;

/* Compute the correlations */
proc corr data=test polychoric;
var x:;
ods output PolychoricCorr=pc;
run; 

/* Reformat the proc corr output into a correlation matrix */
proc transpose data=pc out=pcCorrTable(drop=_label_);
by var;
id withvar;
var corr;
run;

data pcCorr(type="CORR");
set pcCorrTable end=done;
array x x1-x10;
_NAME_ = var;
x{_n_} = 1;
output;
if done then do;
	call missing(of x{*});
	_NAME_ = "x10";
	x10 = 1;
	output;
	end;
run;

/* Get the first 3 principal components */
proc princomp data=pcCorr n=3;
var x1-x10;
run;
PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3010 views
  • 5 likes
  • 3 in conversation