I have data as followed:
Co. ID Year VAR1 VAR2
1 1996 a b
1 1997 c d
1 1998 e f
1 1999 g h
2 1996 i j
2 1997 k l
2 1998 m n
2 1999 o p
I want to calculate the correlation between var1 and var2 using the previous three-year observations for the company and output to a file, e.g., for year 1998 company 1, I want to use company 1 year 1996-1998 observations of var1 and var2 to calculate the correlation between var1 and var2. I know I can use proc corr like this:
proc sort data=have;
by co. ID year;
run;
proc corr data=have out=want noprint;
var var1 var2;
by co.ID;
run;
But this will give me the correlation using all the available observations for each company. Please help!!!
You could calculate the correlations in a data step. This doesn't look very elegant, but is fairly straightforward.
/* Create test data */
data have;
input CoID Year VAR1 VAR2;
cards;
1 1996 10 10
1 1997 15 11
1 1998 20 12
1 1999 25 11
2 1996 23 25
2 1997 24 24
2 1998 25 23
2 1999 80 23
3 1996 40 30
3 1998 45 31
3 1999 46 36
;
/* Compute correlations for 3 consecutive years */
data want;
set have;
by CoID year;
if first.CoID then y=1;
else y+1;
lag11=lag1(var1);
lag21=lag2(var1);
lag12=lag1(var2);
lag22=lag2(var2);
if y>=3 & lag(year)=year-1 & lag2(year)=year-2 & n(of lag:, var1, var2)=6 then do;
mean1=mean(var1, lag11, lag21);
mean2=mean(var2, lag12, lag22);
std1=std(var1, lag11, lag21);
std2=std(var2, lag12, lag22);
if std1 & std2 then corr3y=(( var1-mean1)*( var2-mean2)
+(lag11-mean1)*(lag12-mean2)
+(lag21-mean1)*(lag22-mean2))/2
/(std1*std2);
end;
drop y lag: mean: std:;
run;
proc print data=want;
run;
Please note that the above code computes the correlations only if the three years are consecutive and all six values involved are non-missing. Feel free to weaken these conditions (and adapt the formula accordingly) if you think this is too restrictive.
Assuming there is no gap between two obs.
data have;
input CoID Year VAR1 VAR2 ;
cards;
1 1996 2 5
1 1997 4 8
1 1998 21 32
1 1999 42 6
2 1996 7 4
2 1997 8 14
2 1998 9 21
2 1999 12 8
;
run;
%let window_size=3;
data temp;
set have;
by CoID;
array x{&window_size} _temporary_;
array y{&window_size} _temporary_;
if first.CoID then do;n=0;call missing(of x{*} y{*});end;
n+1;
x{mod(n,&window_size)+1}=var1;
y{mod(n,&window_size)+1}=var2;
if n ge &window_size then do;
do i=1 to &window_size;
var1=x{i};
var2=y{i};
output;
end;
end;
run;
proc corr data=temp out=want noprint;
var var1 var2;
by CoID Year ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.