I have data as followed:
Co. ID Year VAR1 VAR2
1 1996 a b
1 1997 c d
1 1998 e f
1 1999 g h
2 1996 i j
2 1997 k l
2 1998 m n
2 1999 o p
I want to calculate the correlation between var1 and var2 using the previous three-year observations for the company and output to a file, e.g., for year 1998 company 1, I want to use company 1 year 1996-1998 observations of var1 and var2 to calculate the correlation between var1 and var2. I know I can use proc corr like this:
proc sort data=have;
by co. ID year;
run;
proc corr data=have out=want noprint;
var var1 var2;
by co.ID;
run;
But this will give me the correlation using all the available observations for each company. Please help!!!
You could calculate the correlations in a data step. This doesn't look very elegant, but is fairly straightforward.
/* Create test data */
data have;
input CoID Year VAR1 VAR2;
cards;
1 1996 10 10
1 1997 15 11
1 1998 20 12
1 1999 25 11
2 1996 23 25
2 1997 24 24
2 1998 25 23
2 1999 80 23
3 1996 40 30
3 1998 45 31
3 1999 46 36
;
/* Compute correlations for 3 consecutive years */
data want;
set have;
by CoID year;
if first.CoID then y=1;
else y+1;
lag11=lag1(var1);
lag21=lag2(var1);
lag12=lag1(var2);
lag22=lag2(var2);
if y>=3 & lag(year)=year-1 & lag2(year)=year-2 & n(of lag:, var1, var2)=6 then do;
mean1=mean(var1, lag11, lag21);
mean2=mean(var2, lag12, lag22);
std1=std(var1, lag11, lag21);
std2=std(var2, lag12, lag22);
if std1 & std2 then corr3y=(( var1-mean1)*( var2-mean2)
+(lag11-mean1)*(lag12-mean2)
+(lag21-mean1)*(lag22-mean2))/2
/(std1*std2);
end;
drop y lag: mean: std:;
run;
proc print data=want;
run;
Please note that the above code computes the correlations only if the three years are consecutive and all six values involved are non-missing. Feel free to weaken these conditions (and adapt the formula accordingly) if you think this is too restrictive.
Assuming there is no gap between two obs.
data have;
input CoID Year VAR1 VAR2 ;
cards;
1 1996 2 5
1 1997 4 8
1 1998 21 32
1 1999 42 6
2 1996 7 4
2 1997 8 14
2 1998 9 21
2 1999 12 8
;
run;
%let window_size=3;
data temp;
set have;
by CoID;
array x{&window_size} _temporary_;
array y{&window_size} _temporary_;
if first.CoID then do;n=0;call missing(of x{*} y{*});end;
n+1;
x{mod(n,&window_size)+1}=var1;
y{mod(n,&window_size)+1}=var2;
if n ge &window_size then do;
do i=1 to &window_size;
var1=x{i};
var2=y{i};
output;
end;
end;
run;
proc corr data=temp out=want noprint;
var var1 var2;
by CoID Year ;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.