Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Filtering Correlation Matrix

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-05-2020 01:15 PM
(4679 views)

I am dealing with a few dozen variables in the Proc Corr procedure. I used the array function with the colon to specify the variables. The variables have the same prefix, but the number following them is not sequential. It's just a 5 digit number. I need a way to filter out small and insignificant correlation values. Is there a way to do this in the procedure itself?

I've tried outputting the correlation data set so I can filter there, but the WHERE clause in PROC PRINT and PROC SQL does not appear to support the : array operand.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
*calculate correlation matrix for the data;
ods output PearsonCorr=Corr_P;
PROC CORR DATA=sashelp.cars;
VAR MSRP Horsepower MPG_City MPG_Highway Weight;
RUN;
*sort for transposes;
proc sort data=Corr_P;
by VARIABLE;
run;
*restructure data so that it's in a long format for graphing;
*need to transpose the correlation and p-values separately;
proc transpose data=Corr_P out=CorrLong1(rename=(COL1=Correlation))
name=CorrelationID;
var MSRP Horsepower MPG_City MPG_Highway Weight;
by Variable;
run;
proc transpose data=Corr_P out=CorrLong2(rename=(COL1=p_value)) name=PvalueID;
var PMSRP PHorsepower PMPG_City PMPG_Highway PWeight;
by Variable;
run;
*merge data sets to get p-value and correlation in right places;
data CorrLong;
merge CorrLong1 CorrLong2(drop=PvalueID );
by Variable;
LABEL CorrelationID="Correlations"; run;
*sort for graphing;
proc sort data=CorrLong;
by VARIABLE CorrelationID;
run;
*create a heat map graph with P-Values in the squares;
proc sgplot data=CorrLong noautolegend;
heatmap x=Variable y=CorrelationID / colorresponse=Correlation name="nope1" discretex discretey x2axis colormodel=ThreeColorRamp; *Colorresponse allows discrete squares for each correlation. x2axis bring the label to the top;
text x=Variable y=CorrelationID text=p_value / textattrs=(size=10pt) x2axis name='nope2'; /*To overlay significance, create a variable that contans that info and set text=VARIABLE */
label correlation='Pearson Correlation';
yaxis reverse display=(nolabel);
x2axis display=(nolabel);
gradlegend;
run;
```

12 REPLIES 12

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:

https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

For your question you may have to provide more specific details.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

First it may help to show the actual code you used. That lets us know all of the options and the likely layout of the output data set(s).

Second what to do you mean by "filter out small"?

Most of the output sets from Proc Corr are in a specific form that likely needs to be transposed in some form to get what you want. You may consider using one of the small SAS supplied data sets, such as SASHELP.CLASS to generate output and then tell us what the rules with that output would be to get what you want.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

First, use PROC CORR and the OUTP= option to get the correlations in a data set.

Then convert the correlations from wide to long format.

You can then use the WHERE clause to eliminate the small correlations. If the following isn't exactly what you want, it should point you in the right direction:

```
data Have(keep=A:);
set sashelp.cars(
rename=(
MSRP=A123
Invoice=A432
EngineSize=A847
Cylinders=A9212
Horsepower=A23
MPG_City=A439
MPG_Highway=A8736)
);
run;
proc corr data=Have outp=CorrOut noprint;
var A:;
run;
proc print data=CorrOut;
run;
data Want;
set CorrOut(rename=(_NAME_=Var1) where=(_TYPE_='CORR'));
length Var2 $32.;
array A[*] A:;
do j = 1 to dim(A);
Var2 = vname(A[j]);
Corr = A[j];
output;
end;
keep Var1 Var2 Corr;
run;
%let cutoff = 0.5;
proc print data=Want;
where abs(Corr) > &cutoff AND Corr^=1;;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Perhaps you would like techniques near the bottom of this post.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @RandoDando

As proposed by @WarrenKuhfeld , graphic solutions can be vey useful to identify most correlated variables.

E.g.:

```
proc format;
value CorrSignif -0.0-<0.2 = "red"
0.2-<0.4, -0.4-<-0.2 = "orange"
0.4-<0.6, -0.6-<-0.4 = "yellow"
0.6-<0.8, -0.8-<-0.6 = "lightgreen"
0.8-<1.0, -1.0-<-0.8= "forestgreen"
1, -1 = "White";
run;
proc corr data=Have outp=CorrOut (where=(_type_='CORR')) noprint;
run;
proc print data=CorrOut (drop=_type_ rename=(_name_=Variable))
style(column)={backgroundcolor= CorrSignif.} noobs;
run;
```

Best,

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
*calculate correlation matrix for the data;
ods output PearsonCorr=Corr_P;
PROC CORR DATA=sashelp.cars;
VAR MSRP Horsepower MPG_City MPG_Highway Weight;
RUN;
*sort for transposes;
proc sort data=Corr_P;
by VARIABLE;
run;
*restructure data so that it's in a long format for graphing;
*need to transpose the correlation and p-values separately;
proc transpose data=Corr_P out=CorrLong1(rename=(COL1=Correlation))
name=CorrelationID;
var MSRP Horsepower MPG_City MPG_Highway Weight;
by Variable;
run;
proc transpose data=Corr_P out=CorrLong2(rename=(COL1=p_value)) name=PvalueID;
var PMSRP PHorsepower PMPG_City PMPG_Highway PWeight;
by Variable;
run;
*merge data sets to get p-value and correlation in right places;
data CorrLong;
merge CorrLong1 CorrLong2(drop=PvalueID );
by Variable;
LABEL CorrelationID="Correlations"; run;
*sort for graphing;
proc sort data=CorrLong;
by VARIABLE CorrelationID;
run;
*create a heat map graph with P-Values in the squares;
proc sgplot data=CorrLong noautolegend;
heatmap x=Variable y=CorrelationID / colorresponse=Correlation name="nope1" discretex discretey x2axis colormodel=ThreeColorRamp; *Colorresponse allows discrete squares for each correlation. x2axis bring the label to the top;
text x=Variable y=CorrelationID text=p_value / textattrs=(size=10pt) x2axis name='nope2'; /*To overlay significance, create a variable that contans that info and set text=VARIABLE */
label correlation='Pearson Correlation';
yaxis reverse display=(nolabel);
x2axis display=(nolabel);
gradlegend;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

*n* option which requests an ordered display of only the *n* largest correlation coefficients for each variable. In both these cases, each column of the display no longer corresponds to the same variable, but instead is the "Best*n" *variable that gives the *n*th largest correlation coefficient for that row variable.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The CORRELATION procedure is part of SAS Viya. It is included as part of SAS Visual Statistics.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You were quicker than I was in catching my mistake and editing the post. PROC CORR also has both of these options.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi! Thanks for this answer, it was very helpful.

I have a question: I want to use the Pearson values in the lower triangle, and p-values in the upper triangle, how I would do this?

Thanks!

Luan.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Modify the data set before it goes to the graphing procedure.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.