I'm trying to identify which of my variables are perfectly correlated so that I can remove one of them in order to run a principal component analysis. The large number of variables in my dataset makes it hard to find based on visual inspection alone, so is there an easier way for me to identify those pairs?
proc corr data = sashelp.applianc out = have;
var units_:;
run;
Here is a simple way.
proc corr data = sashelp.applianc outp = have(where=(_type_="CORR")) noprint;
var units_:;
run;
proc transpose data=have name=_name2_
out=want(where=(_name_ > _name2_ and col1>0.99) drop=_label_);
by _name_ notsorted;
var units_:;
run;
proc print data=want noobs; run;
_NAME_ _name2_ COL1 units_4 units_16 0.99991 units_4 units_17 0.99989 units_4 units_18 0.99989 units_17 units_16 0.99990 units_18 units_16 0.99991 units_18 units_17 0.99989 units_20 units_19 1.00000 units_22 units_19 0.99999 units_22 units_20 0.99999 units_24 units_19 1.00000 units_24 units_20 1.00000 units_24 units_22 0.99999
Note that this will not identify other typess of colinearity such as when z = x + y.
Here is a simple way.
proc corr data = sashelp.applianc outp = have(where=(_type_="CORR")) noprint;
var units_:;
run;
proc transpose data=have name=_name2_
out=want(where=(_name_ > _name2_ and col1>0.99) drop=_label_);
by _name_ notsorted;
var units_:;
run;
proc print data=want noobs; run;
_NAME_ _name2_ COL1 units_4 units_16 0.99991 units_4 units_17 0.99989 units_4 units_18 0.99989 units_17 units_16 0.99990 units_18 units_16 0.99991 units_18 units_17 0.99989 units_20 units_19 1.00000 units_22 units_19 0.99999 units_22 units_20 0.99999 units_24 units_19 1.00000 units_24 units_20 1.00000 units_24 units_22 0.99999
Note that this will not identify other typess of colinearity such as when z = x + y.
There can be linear combinations of variables that are perfectly correlated. So, for example, if x1+3*X3 = x5 -7*x8+2*x10, then you have perfect correlation there, and PROC CORR only looks at correlations of pairs of variables.
If you want to find these linear combinations that are correlated, you can use PROC PRINCOMP and look for the vectors associated with zero eigenvalues.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.