BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bkq32
Quartz | Level 8

I'm trying to identify which of my variables are perfectly correlated so that I can remove one of them in order to run a principal component analysis. The large number of variables in my dataset makes it hard to find based on visual inspection alone, so is there an easier way for me to identify those pairs?

 

proc corr data = sashelp.applianc out = have;
 var units_:;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is a simple way.

 

proc corr data = sashelp.applianc outp = have(where=(_type_="CORR")) noprint;
 var units_:;
run;

proc transpose data=have name=_name2_ 
    out=want(where=(_name_ > _name2_ and col1>0.99) drop=_label_);
by _name_ notsorted;
var units_:;
run;

proc print data=want noobs; run;
_NAME_ 	        _name2_ 	COL1
units_4 	units_16 	0.99991
units_4 	units_17 	0.99989
units_4 	units_18 	0.99989
units_17 	units_16 	0.99990
units_18 	units_16 	0.99991
units_18 	units_17 	0.99989
units_20 	units_19 	1.00000
units_22 	units_19 	0.99999
units_22 	units_20 	0.99999
units_24 	units_19 	1.00000
units_24 	units_20 	1.00000
units_24 	units_22 	0.99999

Note that this will not identify other typess of colinearity such as when z = x + y.

 

 

PG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

Here is a simple way.

 

proc corr data = sashelp.applianc outp = have(where=(_type_="CORR")) noprint;
 var units_:;
run;

proc transpose data=have name=_name2_ 
    out=want(where=(_name_ > _name2_ and col1>0.99) drop=_label_);
by _name_ notsorted;
var units_:;
run;

proc print data=want noobs; run;
_NAME_ 	        _name2_ 	COL1
units_4 	units_16 	0.99991
units_4 	units_17 	0.99989
units_4 	units_18 	0.99989
units_17 	units_16 	0.99990
units_18 	units_16 	0.99991
units_18 	units_17 	0.99989
units_20 	units_19 	1.00000
units_22 	units_19 	0.99999
units_22 	units_20 	0.99999
units_24 	units_19 	1.00000
units_24 	units_20 	1.00000
units_24 	units_22 	0.99999

Note that this will not identify other typess of colinearity such as when z = x + y.

 

 

PG
bkq32
Quartz | Level 8
@PGStats Thank you! Do you mind explaining why this wouldn't identify other types of collinearity?
PaigeMiller
Diamond | Level 26

There can be linear combinations of variables that are perfectly correlated. So, for example, if x1+3*X3 = x5 -7*x8+2*x10, then you have perfect correlation there, and PROC CORR only looks at correlations of pairs of variables.

 

If you want to find these linear combinations that are correlated, you can use PROC PRINCOMP and look for the vectors associated with zero eigenvalues.

--
Paige Miller
bkq32
Quartz | Level 8
Got it, that makes sense. Thank you very much for the explanation.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 812 views
  • 2 likes
  • 3 in conversation