I'm trying to create a dataset that lists the correlation between pairs of company's returns and the identifies the companies. Basically, I compute the correlation matrix of three companies' (A, B, and C) returns and can output the data, but I don't know how to later identify which correlation is which pair of companies. Any ideas on how to identify each pair? Here's an example:
data companies;
infile cards;
input permno $ date $ ret;
cards;
A 199901 0.10
A 199902 0.20
A 199903 -0.10
A 199904 -0.20
A 199905 0.15
A 199906 -0.15
A 199907 0.12
A 199908 -0.12
A 199909 0.05
A 199910 -0.05
B 199901 -0.05
B 199902 -0.20
B 199903 0.09
B 199904 -0.10
B 199905 0.05
B 199906 0.05
B 199907 -0.07
B 199908 0.07
B 199909 0.01
B 199910 -0.01
C 199901 0.20
C 199902 0.10
C 199903 -0.05
C 199904 -0.05
C 199905 0.19
C 199906 -0.11
C 199907 0.12
C 199908 -0.12
C 199909 0.05
C 199910 -0.05
;
proc contents; run;
proc sort data=companies; by date;
proc transpose data=companies out=want; id permno;
by date;
var ret;
run;
proc iml;
use want;
read all var _NUM_ into x[colname=permno];
close want;
corr=corr(x);
mattrib corr rowname=permno colname=permno;
print corr;
create rj var {corr};
append ;
close rj;
quit;
run;
data junk; set rj; proc print; run;
/**********
data rj lists all 9 correlations but I can't identify the individual pairs.
Specifically, I get
Obs CORR
1 1.00000
2 -0.43578
3 0.88046
4 -0.43578
5 1.00000
6 -0.37909
7 0.88046
8 -0.37909
9 1.0000
And I'd like to get
A A 1
A B -0.43578
A C 0.88046
B A -0.43578
B B 1
B C -0.379087
C A 0.8804574
C B -0.379087
C C 1
And really, I'd just like the bottom half of the matrix (i.e., just the 3 unique correlations)
A B -0.43578
A C 0.88046
B C -0.379087
Any ideas? Thanks!
You have lots of choices:
1) use PROC TRANSPOSE again
2) use the DATA step
3) write a loop in SAS/IML
4) Use the ideas in http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices/
Depending upon how you intend to use the correlations, you might want to output a TYPE=CORR data set:
create R(type=corr) from Corr[r=permno c=permno];
append from Corr[r=permno];
close R;
If you really want the lower triangular data in a "flat" format, use the SYMSQR function .
Thanks Rick -
Not quite what I need, however. I want to use the correlations as dependent variables in a later regression (hence, I need it in "flat" format - each correlation is an observation). So I need to have each correlation as a single row with the identifiers of the two companies. I also tried symsqr, but again can't seem to figure out how to identify the pairs.
e.g., I'd like the output to look like this:
firm1_firm2 corr
A_B -0.43578
A_C 0.88046
B_C -0.379087
Thanks,
Rick
You have lots of choices:
1) use PROC TRANSPOSE again
2) use the DATA step
3) write a loop in SAS/IML
4) Use the ideas in http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices/
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.