BookmarkSubscribeRSS Feed
OceanDream
Fluorite | Level 6

I have data as followed:

 

Co. ID    Year    VAR1    VAR2

1             1996     a            b

1              1997    c            d

1              1998    e            f

1              1999    g           h

2               1996    i            j

2              1997     k           l

2              1998     m         n

2               1999    o          p

 

I want to calculate the correlation between var1 and var2 using the previous three-year observations for the company and output to a file, e.g., for year 1998 company 1, I want to use company 1 year 1996-1998 observations of var1 and var2 to calculate the correlation between var1 and var2. I know I can use proc corr like this:

proc sort data=have;

by co. ID year;

run;

proc corr data=have out=want noprint;

var var1 var2;

by co.ID;

run;

 

But this will give me the correlation using all the available observations for each company. Please help!!!

 

2 REPLIES 2
FreelanceReinh
Jade | Level 19

You could calculate the correlations in a data step. This doesn't look very elegant, but is fairly straightforward.

 

/* Create test data */
data have;
input CoID Year VAR1 VAR2;
cards;
1 1996 10 10
1 1997 15 11
1 1998 20 12
1 1999 25 11
2 1996 23 25
2 1997 24 24
2 1998 25 23
2 1999 80 23
3 1996 40 30
3 1998 45 31
3 1999 46 36
;

/* Compute correlations for 3 consecutive years */
data want;
set have;
by CoID year;
if first.CoID then y=1;
else y+1;
lag11=lag1(var1);
lag21=lag2(var1);
lag12=lag1(var2);
lag22=lag2(var2);
if y>=3 & lag(year)=year-1 & lag2(year)=year-2 & n(of lag:, var1, var2)=6 then do;
  mean1=mean(var1, lag11, lag21);
  mean2=mean(var2, lag12, lag22);
  std1=std(var1, lag11, lag21);
  std2=std(var2, lag12, lag22);
  if std1 & std2 then corr3y=(( var1-mean1)*( var2-mean2)
                             +(lag11-mean1)*(lag12-mean2)
                             +(lag21-mean1)*(lag22-mean2))/2
                             /(std1*std2);
end;
drop y lag: mean: std:;
run;

proc print data=want;
run;

 

Please note that the above code computes the correlations only if the three years are consecutive and all six values involved are non-missing. Feel free to weaken these conditions (and adapt the formula accordingly) if you think this is too restrictive.

Ksharp
Super User

Assuming there is no gap between two obs.

 

 

data have;
input CoID    Year    VAR1    VAR2 ;
cards;
1             1996     2            5
1              1997    4            8
1              1998    21            32
1              1999    42           6
2               1996    7            4
2              1997     8           14
2              1998     9         21
2               1999    12          8
;
run;

%let window_size=3;
data temp;
 set have;
 by CoID;
 array x{&window_size} _temporary_;
 array y{&window_size} _temporary_;
 if first.CoID then do;n=0;call missing(of x{*} y{*});end;
 n+1;
 x{mod(n,&window_size)+1}=var1;
 y{mod(n,&window_size)+1}=var2;
 if n ge &window_size then do;
  do i=1 to &window_size;
   var1=x{i};
   var2=y{i};
   output;
  end;
 end;
run;
proc corr data=temp out=want noprint;
var var1 var2;
by CoID Year    ;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 696 views
  • 0 likes
  • 3 in conversation