BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DomUk
Fluorite | Level 6

Hello,

 

I have 20 years of data and for every year, I estimated the correlations for all the variables using the following code.

proc corr data=... out=pearson_corr;
var x1 x2 x3 x4;
with x1 x2 x3 x4;

by year
run;

 

In the next step I would like to get a correlation matrix which shows the mean of this 20 correlation matrixes.

Does anyone have an idea how this is possible? Using proc means I dont get the right result.

 

Thanks a lot for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@DomUk wrote:
Thanks a lot. I get already the right numbers, however on the x-axis the varaibles are not sorted (so the diagonal "1-line" is not there, instead the 1-values are distributed thorughout the whole table.

I see. Your variable names are not alphabetically sorted (unlike x1, x2, x3, x4), are they? In this case you may want to retrieve their sort order from dataset pearson_corr and finally use this information to sort the output dataset from PROC SUMMARY.

 

As an example, let's assume the variable names are h, d, s, q.

%let vars=h d s q;

/* Compute correlation coefficients by year */

proc corr data=have out=pearson_corr noprint;
var &vars;
by year;
run;

/* Retrieve sort order of analysis variables */

data _vars(keep=_seqno _name_);
set pearson_corr(firstobs=4);
_seqno=_n_;
if _type_~=:'C' then stop;
run;

/* Compute mean values of correlation coefficients */

proc summary data=pearson_corr nway;
class _name_;
var &vars;
output out=_tmp(drop=_type_ _freq_) mean=;
run;

/* Sort the resulting dataset */

proc sql;
create table want as
select a.* from _tmp a, _vars b
where a._name_=b._name_
order by _seqno;
drop table _vars, _tmp;
quit;

EDIT: If you store the variable names in macro variable VARS (as shown above), you can also use this macro variable to obtain the desired sort order. That is, you can omit the DATA step (data _vars ...) and replace the PROC SQL step by this modified version:

proc sql;
create table want as
select * from _tmp
order by whichc(upcase(_name_),"%sysfunc(tranwrd(%upcase(&vars),%str( ),%str(",")))");
drop table _tmp;
quit;

 

View solution in original post

19 REPLIES 19
Rick_SAS
SAS Super FREQ

If you have SAS/IML, you can do this in a few lines of code:

  1. Read the data WHERE _TYPE_="CORR";
  2. Flatten the data so that each 4x4 matrix becomes a 1x16 row vector
  3. FInd the average of the 16 columns. This average is a 1x16 row vector.
  4. Reshape the answer into a 4x4 matrix.

 

data Have;
call streaminit(12345);
do year = 1 to 20;
   do i = 1 to 100;
      x1 = rand("Normal");
      x2 = x1 + rand("Normal");
      x3 = x2 + rand("Normal");
      x4 = x3 + rand("Normal");
      output;
   end;
end;
run;

proc corr data=Have out=pearson_corr noprint;
by year;
var x1 x2 x3 x4;
with x1 x2 x3 x4;
run;

proc iml;
use pearson_corr(where=(_TYPE_='CORR'));
read all var {x1 x2 x3 x4} into X;
close;

/* "flatten" the 4x4 matrices into a 1 x 16 row vector */
Y = shape(X, 0, 16);
/* compute the mean of each column */
mean = mean(Y);
/* reshape into 4x4 matrix */
meanCorr = shape(mean, 0, 4);
print meanCorr;

If you don't have SAS/IML, you can do the same operations, but you'll need to use the DATA step and arrays to reshape the data, then use PROC MEANS to find the means of the 16-variable data.

DomUk
Fluorite | Level 6
Thank you very much,
how can I keep automatically the variable names ( actually I have 30 varaibles, so to rename them step by step would be taff) and bring them in an folder, after your code they are renamed in col 1... and row1... and only shown in the result viewer.

FreelanceReinh
Jade | Level 19

Hello @DomUk,

 

Try this:

proc summary data=pearson_corr nway;
class _name_;
var x:;
output out=want(keep=_name_ x:) mean=;
run;

[Edit: Removed redundant statement where _type_=:'C';]

DomUk
Fluorite | Level 6
Thanks a lot. I get already the right numbers, however on the x-axis the varaibles are not sorted (so the diagonal "1-line" is not there, instead the 1-values are distributed thorughout the whole table.
FreelanceReinh
Jade | Level 19

@DomUk wrote:
Thanks a lot. I get already the right numbers, however on the x-axis the varaibles are not sorted (so the diagonal "1-line" is not there, instead the 1-values are distributed thorughout the whole table.

I see. Your variable names are not alphabetically sorted (unlike x1, x2, x3, x4), are they? In this case you may want to retrieve their sort order from dataset pearson_corr and finally use this information to sort the output dataset from PROC SUMMARY.

 

As an example, let's assume the variable names are h, d, s, q.

%let vars=h d s q;

/* Compute correlation coefficients by year */

proc corr data=have out=pearson_corr noprint;
var &vars;
by year;
run;

/* Retrieve sort order of analysis variables */

data _vars(keep=_seqno _name_);
set pearson_corr(firstobs=4);
_seqno=_n_;
if _type_~=:'C' then stop;
run;

/* Compute mean values of correlation coefficients */

proc summary data=pearson_corr nway;
class _name_;
var &vars;
output out=_tmp(drop=_type_ _freq_) mean=;
run;

/* Sort the resulting dataset */

proc sql;
create table want as
select a.* from _tmp a, _vars b
where a._name_=b._name_
order by _seqno;
drop table _vars, _tmp;
quit;

EDIT: If you store the variable names in macro variable VARS (as shown above), you can also use this macro variable to obtain the desired sort order. That is, you can omit the DATA step (data _vars ...) and replace the PROC SQL step by this modified version:

proc sql;
create table want as
select * from _tmp
order by whichc(upcase(_name_),"%sysfunc(tranwrd(%upcase(&vars),%str( ),%str(",")))");
drop table _tmp;
quit;

 

DomUk
Fluorite | Level 6
hmm I dont know why but this code does not work.
%let vars=earn cfo c_sales c_cogs c_oe c_int c_tax
c_othert del_ar del_inv del_ap depr amort
other_acc_s_t acc_s_t div_s_t BV_s_t at_s_t negEE_s_t NEGE DivD;

/* Compute correlation coefficients by year */

proc corr data=mylibf1.endversion out=pearson_corr noprint;
var &vars;
by houyear;
run;

/* Retrieve sort order of analysis variables */

data _vars(keep=_seqno _name_);
set pearson_corr(firstobs=21);
_seqno=_n_;
if _type_~=:'C' then stop;
run;

/* Compute mean values of correlation coefficients */

proc summary data=pearson_corr nway;
class _name_;
var &vars;
output out=_tmp(drop=_type_ _freq_) mean=;
run;

/* Sort the resulting dataset */

proc sql;
create table want as
select a.* from _tmp a, _vars b
where a._name_=b._name_
order by _seqno;
drop table _vars, _tmp;
quit;

Now I have four variables in the rows and all variables in the columns.
Sry Iam a very new user of sas.
FreelanceReinh
Jade | Level 19

You changed 

set pearson_corr(firstobs=4);

to

set pearson_corr(firstobs=21);

probably assuming that the "4" meant the number of variables -- but that was pure coincidence. The SET statement must start at the fourth observation irrespective of the number of variables.


DomUk
Fluorite | Level 6
thank you! Now it works.
If I run the correlation by year, then I get a pvalue for every variable-combination.
Is it possible to become the average pvalues as well?
FreelanceReinh
Jade | Level 19

Yes, it's possible to obtain average p-values as well.

 

Here's a complete example:

data have;
call streaminit(27182818);
do year=1 to 20;
  do _n_=1 to 100;
    d=rand('uniform');
    h=rand('uniform');
    s=rand('uniform');
    q=rand('uniform');
    output;
  end;
end;
run;

%let vars=h d s q; /* Variable order may be defined differently than in the dataset. */

/* Compute correlation coefficients by year */

ods output pearsoncorr=pearson_corr;
proc corr data=have;
var &vars;
by year;
run;

/* Compute mean values of correlation coefficients and p-values */

proc summary data=pearson_corr nway;
class Variable;
var _numeric_;
output out=_tmp(drop=year _:) mean=;
run;

/* Sort the resulting dataset */

proc sql;
create table want as
select * from _tmp
order by whichc(upcase(Variable),"%sysfunc(tranwrd(%upcase(&vars),%str( ),%str(",")))");
drop table _tmp;
quit;

Result:

Variable           h           d           s           q          Ph          Pd          Ps          Pq

   h         1.00000     0.00485     0.01906     0.01108       .          0.5860      0.4575      0.5198
   d         0.00485     1.00000     0.00465    -0.00434      0.5860       .          0.4407      0.4343
   s         0.01906     0.00465     1.00000    -0.01984      0.4575      0.4407       .          0.5118
   q         0.01108    -0.00434    -0.01984     1.00000      0.5198      0.4343      0.5118       .

The variable list _numeric_ in the PROC SUMMARY step includes year (which is then dropped from dataset _tmp). Alternatively, you could use a different variable list:

proc summary data=pearson_corr nway;
class Variable;
var &vars p:;
output out=_tmp(drop=_:) mean=;
run;

This works in the current example (where none of the variable names in &vars starts with "P"). In other cases you might need to construct the variable list for the VAR statement differently (e.g. similarly to the TRANWRD technique used in the PROC SQL step).

DomUk
Fluorite | Level 6
Hi again 🙂
If I run the code, I get the following note message: NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause. I dont exactly understand the meaning of this statement. Can I ignore it? The exact log statement is the following:
187 proc sql;
188 create table Avg_korr2 as
189 select * from _tmp
190 order by whichc(upcase(Variable),"%sysfunc(tranwrd(%upcase(&vars),%str( ),%str(",")))");
NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.
NOTE: Table WORK.AVG_KORR2 created, with 21 rows and 43 columns.

191 drop table _tmp;
NOTE: Table WORK._TMP has been dropped.
192 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.07 seconds
cpu time 0.09 seconds

FreelanceReinh
Jade | Level 19

Hi @DomUk,

 

Yes, you can ignore that note. It refers to the WHICHC expression (in the ORDER BY clause), which indeed "doesn't appear in [the] SELECT clause" -- but exactly this is intended.

DomUk
Fluorite | Level 6
Hi again,
is it possible to run the same code for spearman correlation? If I substitute pearson corr with spearmancorr it does not work. YOu can find the code below.
Thanks again for an answer!

%let vars=earn_s_t cfo_s_t c_sales_s_t c_cogs_s_t c_oe_s_t c_int_s_t c_tax_s_t
c_other_s_t del_ar_s_t del_inv_s_t del_ap_s_t depr_s_t amort_s_t
oth_acc_s_t acc_s_t div_s_t BV_s_t at_s_t negEE_s_t NEGE DivD;

ods output SpearmanCorr=spearman_Korr;
proc corr data=mylib.endversion;
var &vars;
by houyear;
run;

proc summary data=Spearman_Korr nway;
class Variable;
var _numeric_;
output out=_tmp(drop=houyear _:) mean=;
run;

proc sql;
create table mylib.Avg_korr_Spearman as
select * from _tmp
order by whichc(upcase(Variable),"%sysfunc(tranwrd(%upcase(&vars),%str( ),%str(",")))");
drop table _tmp;
quit;
PaigeMiller
Diamond | Level 26

What doesn't work? Explain. Show us.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Of course, you can apply the same techniques to Spearman correlation coefficients that we used for Pearson correlation coefficients. Just specify the appropriate option SPEARMAN in the PROC CORR statement.

 

proc corr data=mylib.endversion spearman;

Otherwise you will get this warning in the log:

WARNING: Output 'SpearmanCorr' was not created.  Make sure that the output object name, label, or path is spelled correctly.  Also, verify that the
         appropriate procedure options are used to produce the requested output object.  For example, verify that the NOPRINT option is not used.

The request to "verify that the appropriate procedure options are used to produce the requested output object" means: Check column "Option" in the table of ODS table names, which is found under "Details" in the procedure documentation. For PROC CORR: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/procstat/procstat_corr_details24.htm.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 2943 views
  • 8 likes
  • 4 in conversation