Statistical Procedures

RandoDando · Posted 05-05-2020 01:15 PM

I am dealing with a few dozen variables in the Proc Corr procedure. I used the array function with the colon to specify the variables. The variables have the same prefix, but the number following them is not sequential. It's just a 5 digit number. I need a way to filter out small and insignificant correlation values. Is there a way to do this in the procedure itself?

I've tried outputting the correlation data set so I can filter there, but the WHERE clause in PROC PRINT and PROC SQL does not appear to support the : array operand.

Reeza · Posted 05-06-2020 11:41 AM

*calculate correlation matrix for the data;
ods output PearsonCorr=Corr_P;

PROC CORR DATA=sashelp.cars;
    VAR MSRP Horsepower MPG_City MPG_Highway Weight;
RUN;

*sort for transposes;
proc sort data=Corr_P;
    by VARIABLE;
run;

*restructure data so that it's in a long format for graphing;
*need to transpose the correlation and p-values separately;
proc transpose data=Corr_P out=CorrLong1(rename=(COL1=Correlation)) 
        name=CorrelationID;
    var MSRP Horsepower MPG_City MPG_Highway Weight;
    by Variable;
run;

proc transpose data=Corr_P out=CorrLong2(rename=(COL1=p_value)) name=PvalueID;
    var PMSRP PHorsepower PMPG_City PMPG_Highway PWeight;
    by Variable;
run;

*merge data sets to get p-value and correlation in right places;
data CorrLong;
    merge CorrLong1 CorrLong2(drop=PvalueID );
    by Variable;
    LABEL CorrelationID="Correlations"; run;

*sort for graphing;
proc sort data=CorrLong;
    by VARIABLE CorrelationID;
run;


*create a heat map graph with P-Values in the squares;
proc sgplot data=CorrLong noautolegend;
heatmap x=Variable y=CorrelationID / colorresponse=Correlation name="nope1" discretex discretey x2axis colormodel=ThreeColorRamp; *Colorresponse allows discrete squares for each correlation. x2axis bring the label to the top;
text x=Variable y=CorrelationID text=p_value  / textattrs=(size=10pt) x2axis name='nope2'; /*To overlay significance, create a variable that contans that info and set text=VARIABLE */
label correlation='Pearson Correlation';
yaxis reverse display=(nolabel);
x2axis display=(nolabel);
gradlegend;

run;

View solution in original post

Reeza · Posted 05-05-2020 01:31 PM

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

For your question you may have to provide more specific details.

ballardw · Posted 05-05-2020 02:00 PM

First it may help to show the actual code you used. That lets us know all of the options and the likely layout of the output data set(s).

Second what to do you mean by "filter out small"?

Most of the output sets from Proc Corr are in a specific form that likely needs to be transposed in some form to get what you want. You may consider using one of the small SAS supplied data sets, such as SASHELP.CLASS to generate output and then tell us what the rules with that output would be to get what you want.

Rick_SAS · Posted 05-05-2020 03:27 PM

First, use PROC CORR and the OUTP= option to get the correlations in a data set.

Then convert the correlations from wide to long format.

You can then use the WHERE clause to eliminate the small correlations. If the following isn't exactly what you want, it should point you in the right direction:

data Have(keep=A:);
set sashelp.cars(
rename=(
   MSRP=A123
   Invoice=A432
   EngineSize=A847
   Cylinders=A9212 
   Horsepower=A23
   MPG_City=A439
   MPG_Highway=A8736)
);
run;

proc corr data=Have outp=CorrOut noprint;
   var A:;
run;

proc print data=CorrOut;
run;

data Want;
set CorrOut(rename=(_NAME_=Var1) where=(_TYPE_='CORR'));
length Var2 $32.;
array A[*] A:;
do j = 1 to dim(A);
   Var2 = vname(A[j]);
   Corr = A[j];
   output;
end;
keep Var1 Var2 Corr;
run;

%let cutoff = 0.5;
proc print data=Want;
where abs(Corr) > &cutoff AND Corr^=1;;
run;

WarrenKuhfeld · Posted 05-06-2020 01:34 AM

Perhaps you would like techniques near the bottom of this post.

https://blogs.sas.com/content/graphicallyspeaking/2017/12/11/displaying-upper-lower-triangle-correla...

ed_sas_member · Posted 05-06-2020 08:38 AM

Hi @RandoDando

As proposed by @WarrenKuhfeld , graphic solutions can be vey useful to identify most correlated variables.

E.g.:

proc format;
	value CorrSignif -0.0-<0.2 = "red"
		 			 0.2-<0.4, -0.4-<-0.2 = "orange"
		 			 0.4-<0.6, -0.6-<-0.4 = "yellow"
		 			 0.6-<0.8, -0.8-<-0.6 = "lightgreen"
		 			 0.8-<1.0, -1.0-<-0.8= "forestgreen"
		 			 1, -1 = "White";
run;

proc corr data=Have outp=CorrOut (where=(_type_='CORR')) noprint;
run;

proc print data=CorrOut (drop=_type_ rename=(_name_=Variable))
		   style(column)={backgroundcolor= CorrSignif.} noobs;
run;

Capture d’écran 2020-05-06 à 14.37.34.png

Best,

Reeza · Posted 05-06-2020 11:41 AM

*calculate correlation matrix for the data;
ods output PearsonCorr=Corr_P;

PROC CORR DATA=sashelp.cars;
    VAR MSRP Horsepower MPG_City MPG_Highway Weight;
RUN;

*sort for transposes;
proc sort data=Corr_P;
    by VARIABLE;
run;

*restructure data so that it's in a long format for graphing;
*need to transpose the correlation and p-values separately;
proc transpose data=Corr_P out=CorrLong1(rename=(COL1=Correlation)) 
        name=CorrelationID;
    var MSRP Horsepower MPG_City MPG_Highway Weight;
    by Variable;
run;

proc transpose data=Corr_P out=CorrLong2(rename=(COL1=p_value)) name=PvalueID;
    var PMSRP PHorsepower PMPG_City PMPG_Highway PWeight;
    by Variable;
run;

*merge data sets to get p-value and correlation in right places;
data CorrLong;
    merge CorrLong1 CorrLong2(drop=PvalueID );
    by Variable;
    LABEL CorrelationID="Correlations"; run;

*sort for graphing;
proc sort data=CorrLong;
    by VARIABLE CorrelationID;
run;


*create a heat map graph with P-Values in the squares;
proc sgplot data=CorrLong noautolegend;
heatmap x=Variable y=CorrelationID / colorresponse=Correlation name="nope1" discretex discretey x2axis colormodel=ThreeColorRamp; *Colorresponse allows discrete squares for each correlation. x2axis bring the label to the top;
text x=Variable y=CorrelationID text=p_value  / textattrs=(size=10pt) x2axis name='nope2'; /*To overlay significance, create a variable that contans that info and set text=VARIABLE */
label correlation='Pearson Correlation';
yaxis reverse display=(nolabel);
x2axis display=(nolabel);
gradlegend;

run;

RandoDando · Posted 05-06-2020 12:32 PM

Thank you @Reeza . I have 49 variables in my correlation matrix, so I had to adjust the size of the graph in the ods settings, as well as the font size for the correlation values. It's big and it's busy, but it's only being used for analytical purposes, and it tells me what I need to know. Much easier than looking through a matrix without any indicators to highlight higher values.

MichaelL_SAS · Posted 05-08-2020 11:12 AM

I see this has an accepted solution, but I'll just comment that if all you want is information about the largest correlation coefficients you can use the RANK option in the PROC CORR statement to request the correlation coefficients be displayed in order from highest to lowest absolute value, or the BEST=n option which requests an ordered display of only the n largest correlation coefficients for each variable. In both these cases, each column of the display no longer corresponds to the same variable, but instead is the "Bestn" variable that gives the nth largest correlation coefficient for that row variable.

Rick_SAS · Posted 05-08-2020 11:16 AM

The CORRELATION procedure is part of SAS Viya. It is included as part of SAS Visual Statistics.

MichaelL_SAS · Posted 05-08-2020 11:17 AM

You were quicker than I was in catching my mistake and editing the post. PROC CORR also has both of these options.

LuanOliveira · Posted 12-28-2021 12:18 PM

Hi! Thanks for this answer, it was very helpful.

I have a question: I want to use the Pearson values in the lower triangle, and p-values in the upper triangle, how I would do this?

Thanks!

Luan.

Reeza · Posted 12-29-2021 11:27 AM

Modify the data set before it goes to the graphing procedure.

Statistical Procedures

Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Re: Filtering Correlation Matrix

Stepping into the Matrix in SAS

SAS Customer Intelligence 360: Custom Filters

Hybrid Filtering

Collaborative Filtering

Filtering in Visual Analytics

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...