I have a data set of daily prices for each ID that looks like the following example.
data have;
input id $1. date :yymmdd10. price 5.;
format date yymmdd10.;
datalines;
A 2000-01-02 3
A 2000-01-03 4
A 2000-01-04 6
A 2000-01-05 4
A 2000-01-06 7
A 2000-01-07 8
A 2000-01-08 9
A 2000-01-09 6
A 2000-01-10 7
A 2000-01-11 5
A 2000-01-12 6
A 2000-01-13 5
B 2000-01-02 4
B 2000-01-03 6
B 2000-01-04 7
B 2000-01-05 8
B 2000-01-06 9
B 2000-01-07 10
B 2000-01-08 12
B 2000-01-09 11
B 2000-01-10 9
B 2000-01-11 8
B 2000-01-12 9
B 2000-01-13 7
C 2000-01-02 3
C 2000-01-03 2
C 2000-01-04 4
C 2000-01-05 3
C 2000-01-06 4
C 2000-01-07 5
C 2000-01-08 4
C 2000-01-09 3
C 2000-01-10 4
C 2000-01-11 6
C 2000-01-12 6
C 2000-01-13 5
;
run;
data have; set have; week = intnx('week', date, 0, 'e'); format week yymmdd10.; run;
I want to calculate correlation between every pair of IDs' prices: i.e., correlation between prices of (A,B), (B,C), (A,C), over each week. The result I want looks like the following:
data want;
input id1 $ id2 $ week :yymmdd10. corr 5.;
format week yymmdd10.;
datalines;
A B 2000-01-08 0.3
A C 2000-01-08 0.5
B C 2000-01-08 0.4
A B 2000-01-15 0.2
A C 2000-01-15 0.4
B C 2000-01-15 0.5
;
run;
Is there a way to do this efficiently, especially when I have A, B, C, ..., Z and many weeks?
Ok, let's give this a try. You will have to change this code in the third row, so instead of a--c you have a valid SAS list of variable names for your problem, whatever they are, and that they are also in alphabetical order for the code to work. Naturally, we can program them to be in alphabetical order if that's not an easy thing for you to do manually. EDIT: ignore the alphabetical order comment, the code works even on variable lists where the variables are NOT in alphabetical order.
data want;
set corr;
array x(*) a--c;
do j=1 to dim(x);
thisvar=vname(x(j));
if thisvar>_name_ then do;
corr=vvaluex(thisvar);
output;
end;
end;
keep week _name_ thisvar corr;
run;
Correlations in SAS are defined between variables, not between rows of the data set. So you will have to transpose the data so that you have variables A B and C, and not rows that represent A B and C.
proc sort data=have;
by date;
run;
proc transpose data=have out=have_t;
by date;
var price;
id id;
run;
data have_t;
set have_t;
week = intnx('week', date, 0, 'e');
format week mmddyy10.;
run;
proc corr data=have_t;
var a b c;
by week;
run;
When I do
proc corr data = have_t out = corr (where = (_type_ = 'CORR')) noprint;
var a b c;
by week;
run;
I get a correlation matrix. Is there a simple way to conver the matrix to what I want?
@xyxu wrote:
proc corr data = have_t out = corr (where = (_type_ = 'CORR')) noprint; var a b c; by week; run;
I get a correlation matrix. Is there a simple way to conver the matrix to what I want?
I don't know what you mean. I don't know what you want, or why the above code doesn't meet your needs.
Sorry if my question wasn't clear. In my original question, there is a dataset "want", which describes my goal. I was wondering how to convert the correlation matrix to that dataset.
It has to be that form, and no other form?
Let me ask another real-world question ... are there only three variables A B C or does the real-world problem you are working on have many more than three variables?
Yeah I am trying to define a pair-level identifier (e.g., "AB", "AC", "BC") and use the pairwise correlation for later steps. In the real-world problem, there are around 20 IDs.
Ok, let's give this a try. You will have to change this code in the third row, so instead of a--c you have a valid SAS list of variable names for your problem, whatever they are, and that they are also in alphabetical order for the code to work. Naturally, we can program them to be in alphabetical order if that's not an easy thing for you to do manually. EDIT: ignore the alphabetical order comment, the code works even on variable lists where the variables are NOT in alphabetical order.
data want;
set corr;
array x(*) a--c;
do j=1 to dim(x);
thisvar=vname(x(j));
if thisvar>_name_ then do;
corr=vvaluex(thisvar);
output;
end;
end;
keep week _name_ thisvar corr;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.