BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RandoDando
Pyrite | Level 9

I am dealing with a few dozen variables in the Proc Corr procedure.  I used the array function with the colon to specify the variables.  The variables have the same prefix, but the number following them is not sequential.  It's just a 5 digit number.  I need a way to filter out small and insignificant correlation values.  Is there a way to do this in the procedure itself?  

 

I've tried outputting the correlation data set so I can filter there, but the WHERE clause in PROC PRINT and PROC SQL does not appear to support the : array operand.  

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
*calculate correlation matrix for the data;
ods output PearsonCorr=Corr_P;

PROC CORR DATA=sashelp.cars;
    VAR MSRP Horsepower MPG_City MPG_Highway Weight;
RUN;

*sort for transposes;
proc sort data=Corr_P;
    by VARIABLE;
run;

*restructure data so that it's in a long format for graphing;
*need to transpose the correlation and p-values separately;
proc transpose data=Corr_P out=CorrLong1(rename=(COL1=Correlation)) 
        name=CorrelationID;
    var MSRP Horsepower MPG_City MPG_Highway Weight;
    by Variable;
run;

proc transpose data=Corr_P out=CorrLong2(rename=(COL1=p_value)) name=PvalueID;
    var PMSRP PHorsepower PMPG_City PMPG_Highway PWeight;
    by Variable;
run;

*merge data sets to get p-value and correlation in right places;
data CorrLong;
    merge CorrLong1 CorrLong2(drop=PvalueID );
    by Variable;
    LABEL CorrelationID="Correlations"; run;

*sort for graphing;
proc sort data=CorrLong;
    by VARIABLE CorrelationID;
run;


*create a heat map graph with P-Values in the squares;
proc sgplot data=CorrLong noautolegend;
heatmap x=Variable y=CorrelationID / colorresponse=Correlation name="nope1" discretex discretey x2axis colormodel=ThreeColorRamp; *Colorresponse allows discrete squares for each correlation. x2axis bring the label to the top;
text x=Variable y=CorrelationID text=p_value  / textattrs=(size=10pt) x2axis name='nope2'; /*To overlay significance, create a variable that contans that info and set text=VARIABLE */
label correlation='Pearson Correlation';
yaxis reverse display=(nolabel);
x2axis display=(nolabel);
gradlegend;

run;

View solution in original post

12 REPLIES 12
Reeza
Super User

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

 

For your question you may have to provide more specific details.

 

ballardw
Super User

First it may help to show the actual code you used. That lets us know all of the options and the likely layout of the output data set(s).

Second what to do you mean by "filter out small"?

 

Most of the output sets from Proc Corr are in a specific form that likely needs to be transposed in some form to get what you want. You may consider using one of the small SAS supplied data sets, such as SASHELP.CLASS to generate output and then tell us what the rules with that output would be to get what you want.

Rick_SAS
SAS Super FREQ

First, use PROC CORR and the OUTP= option to get the correlations in a data set.

Then convert the correlations from wide to long format.

You can then use the WHERE clause to eliminate the small correlations. If the following isn't exactly what you want, it should point you in the right direction:

 

data Have(keep=A:);
set sashelp.cars(
rename=(
   MSRP=A123
   Invoice=A432
   EngineSize=A847
   Cylinders=A9212 
   Horsepower=A23
   MPG_City=A439
   MPG_Highway=A8736)
);
run;

proc corr data=Have outp=CorrOut noprint;
   var A:;
run;

proc print data=CorrOut;
run;

data Want;
set CorrOut(rename=(_NAME_=Var1) where=(_TYPE_='CORR'));
length Var2 $32.;
array A[*] A:;
do j = 1 to dim(A);
   Var2 = vname(A[j]);
   Corr = A[j];
   output;
end;
keep Var1 Var2 Corr;
run;

%let cutoff = 0.5;
proc print data=Want;
where abs(Corr) > &cutoff AND Corr^=1;;
run;

 

 

ed_sas_member
Meteorite | Level 14

Hi @RandoDando 

As proposed by @WarrenKuhfeld , graphic solutions can be vey useful to identify most correlated variables.

E.g.:

proc format;
	value CorrSignif -0.0-<0.2 = "red"
		 			 0.2-<0.4, -0.4-<-0.2 = "orange"
		 			 0.4-<0.6, -0.6-<-0.4 = "yellow"
		 			 0.6-<0.8, -0.8-<-0.6 = "lightgreen"
		 			 0.8-<1.0, -1.0-<-0.8= "forestgreen"
		 			 1, -1 = "White";
run;

proc corr data=Have outp=CorrOut (where=(_type_='CORR')) noprint;
run;

proc print data=CorrOut (drop=_type_ rename=(_name_=Variable))
		   style(column)={backgroundcolor= CorrSignif.} noobs;
run;

Capture d’écran 2020-05-06 à 14.37.34.png

 Best,

Reeza
Super User
*calculate correlation matrix for the data;
ods output PearsonCorr=Corr_P;

PROC CORR DATA=sashelp.cars;
    VAR MSRP Horsepower MPG_City MPG_Highway Weight;
RUN;

*sort for transposes;
proc sort data=Corr_P;
    by VARIABLE;
run;

*restructure data so that it's in a long format for graphing;
*need to transpose the correlation and p-values separately;
proc transpose data=Corr_P out=CorrLong1(rename=(COL1=Correlation)) 
        name=CorrelationID;
    var MSRP Horsepower MPG_City MPG_Highway Weight;
    by Variable;
run;

proc transpose data=Corr_P out=CorrLong2(rename=(COL1=p_value)) name=PvalueID;
    var PMSRP PHorsepower PMPG_City PMPG_Highway PWeight;
    by Variable;
run;

*merge data sets to get p-value and correlation in right places;
data CorrLong;
    merge CorrLong1 CorrLong2(drop=PvalueID );
    by Variable;
    LABEL CorrelationID="Correlations"; run;

*sort for graphing;
proc sort data=CorrLong;
    by VARIABLE CorrelationID;
run;


*create a heat map graph with P-Values in the squares;
proc sgplot data=CorrLong noautolegend;
heatmap x=Variable y=CorrelationID / colorresponse=Correlation name="nope1" discretex discretey x2axis colormodel=ThreeColorRamp; *Colorresponse allows discrete squares for each correlation. x2axis bring the label to the top;
text x=Variable y=CorrelationID text=p_value  / textattrs=(size=10pt) x2axis name='nope2'; /*To overlay significance, create a variable that contans that info and set text=VARIABLE */
label correlation='Pearson Correlation';
yaxis reverse display=(nolabel);
x2axis display=(nolabel);
gradlegend;

run;
RandoDando
Pyrite | Level 9

Thank you @Reeza .  I have 49 variables in my correlation matrix, so I had to adjust the size of the graph in the ods settings, as well as the font size for the correlation values.  It's big and it's busy, but it's only being used for analytical purposes, and it tells me what I need to know.  Much easier than looking through a matrix without any indicators to highlight higher values.  

MichaelL_SAS
SAS Employee

I see this has an accepted solution, but I'll just comment that if all you want is information about the largest correlation coefficients you can use the RANK option in the PROC CORR statement to request the correlation coefficients be displayed in order from highest to lowest absolute value, or the BEST=n option which requests an ordered display of only the n largest correlation coefficients for each variable. In both these cases, each column of the display no longer corresponds to the same variable, but instead is the "Bestn" variable that gives the nth largest correlation coefficient for that row variable.  

Rick_SAS
SAS Super FREQ

The CORRELATION procedure is part of SAS Viya. It is included as part of SAS Visual Statistics.

MichaelL_SAS
SAS Employee
You were quicker than I was in catching my mistake and editing the post. PROC CORR also has both of these options.
LuanOliveira
Calcite | Level 5

Hi! Thanks for this answer, it was very helpful. 

 

I have a question: I want to use the Pearson values in the lower triangle, and p-values in the upper triangle, how I would do this?

 

Thanks!

 

Luan.

Reeza
Super User

Modify the data set before it goes to the graphing procedure. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 4841 views
  • 12 likes
  • 8 in conversation