I'm trying to create a dataset that lists the correlation between pairs of company's returns and the identifies the companies. Basically, I compute the correlation matrix of three companies' (A, B, and C) returns and can output the data, but I don't know how to later identify which correlation is which pair of companies. Any ideas on how to identify each pair? Here's an example:
data companies;
infile cards;
input permno $ date $ ret;
cards;
A 199901 0.10
A 199902 0.20
A 199903 -0.10
A 199904 -0.20
A 199905 0.15
A 199906 -0.15
A 199907 0.12
A 199908 -0.12
A 199909 0.05
A 199910 -0.05
B 199901 -0.05
B 199902 -0.20
B 199903 0.09
B 199904 -0.10
B 199905 0.05
B 199906 0.05
B 199907 -0.07
B 199908 0.07
B 199909 0.01
B 199910 -0.01
C 199901 0.20
C 199902 0.10
C 199903 -0.05
C 199904 -0.05
C 199905 0.19
C 199906 -0.11
C 199907 0.12
C 199908 -0.12
C 199909 0.05
C 199910 -0.05
;
proc contents; run;
proc sort data=companies; by date;
proc transpose data=companies out=want; id permno;
by date;
var ret;
run;
proc iml;
use want;
read all var _NUM_ into x[colname=permno];
close want;
corr=corr(x);
mattrib corr rowname=permno colname=permno;
print corr;
create rj var {corr};
append ;
close rj;
quit;
run;
data junk; set rj; proc print; run;
/**********
data rj lists all 9 correlations but I can't identify the individual pairs.
Specifically, I get
Obs CORR
1 1.00000
2 -0.43578
3 0.88046
4 -0.43578
5 1.00000
6 -0.37909
7 0.88046
8 -0.37909
9 1.0000
And I'd like to get
A A 1
A B -0.43578
A C 0.88046
B A -0.43578
B B 1
B C -0.379087
C A 0.8804574
C B -0.379087
C C 1
And really, I'd just like the bottom half of the matrix (i.e., just the 3 unique correlations)
A B -0.43578
A C 0.88046
B C -0.379087
Any ideas? Thanks!
You have lots of choices:
1) use PROC TRANSPOSE again
2) use the DATA step
3) write a loop in SAS/IML
4) Use the ideas in http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices/
Depending upon how you intend to use the correlations, you might want to output a TYPE=CORR data set:
create R(type=corr) from Corr[r=permno c=permno];
append from Corr[r=permno];
close R;
If you really want the lower triangular data in a "flat" format, use the SYMSQR function .
Thanks Rick -
Not quite what I need, however. I want to use the correlations as dependent variables in a later regression (hence, I need it in "flat" format - each correlation is an observation). So I need to have each correlation as a single row with the identifiers of the two companies. I also tried symsqr, but again can't seem to figure out how to identify the pairs.
e.g., I'd like the output to look like this:
firm1_firm2 corr
A_B -0.43578
A_C 0.88046
B_C -0.379087
Thanks,
Rick
You have lots of choices:
1) use PROC TRANSPOSE again
2) use the DATA step
3) write a loop in SAS/IML
4) Use the ideas in http://blogs.sas.com/content/iml/2012/02/29/defining-banded-and-triangular-matrices/
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.