BookmarkSubscribeRSS Feed
OceanDream
Fluorite | Level 6

I have data as followed:

 

Co. ID    Year    VAR1    VAR2

1             1996     a            b

1              1997    c            d

1              1998    e            f

1              1999    g           h

2               1996    i            j

2              1997     k           l

2              1998     m         n

2               1999    o          p

 

I want to calculate the correlation between var1 and var2 using the previous three-year observations for the company and output to a file, e.g., for year 1998 company 1, I want to use company 1 year 1996-1998 observations of var1 and var2 to calculate the correlation between var1 and var2. I know I can use proc corr like this:

proc sort data=have;

by co. ID year;

run;

proc corr data=have out=want noprint;

var var1 var2;

by co.ID;

run;

 

But this will give me the correlation using all the available observations for each company. Please help!!!

 

2 REPLIES 2
FreelanceReinh
Jade | Level 19

You could calculate the correlations in a data step. This doesn't look very elegant, but is fairly straightforward.

 

/* Create test data */
data have;
input CoID Year VAR1 VAR2;
cards;
1 1996 10 10
1 1997 15 11
1 1998 20 12
1 1999 25 11
2 1996 23 25
2 1997 24 24
2 1998 25 23
2 1999 80 23
3 1996 40 30
3 1998 45 31
3 1999 46 36
;

/* Compute correlations for 3 consecutive years */
data want;
set have;
by CoID year;
if first.CoID then y=1;
else y+1;
lag11=lag1(var1);
lag21=lag2(var1);
lag12=lag1(var2);
lag22=lag2(var2);
if y>=3 & lag(year)=year-1 & lag2(year)=year-2 & n(of lag:, var1, var2)=6 then do;
  mean1=mean(var1, lag11, lag21);
  mean2=mean(var2, lag12, lag22);
  std1=std(var1, lag11, lag21);
  std2=std(var2, lag12, lag22);
  if std1 & std2 then corr3y=(( var1-mean1)*( var2-mean2)
                             +(lag11-mean1)*(lag12-mean2)
                             +(lag21-mean1)*(lag22-mean2))/2
                             /(std1*std2);
end;
drop y lag: mean: std:;
run;

proc print data=want;
run;

 

Please note that the above code computes the correlations only if the three years are consecutive and all six values involved are non-missing. Feel free to weaken these conditions (and adapt the formula accordingly) if you think this is too restrictive.

Ksharp
Super User

Assuming there is no gap between two obs.

 

 

data have;
input CoID    Year    VAR1    VAR2 ;
cards;
1             1996     2            5
1              1997    4            8
1              1998    21            32
1              1999    42           6
2               1996    7            4
2              1997     8           14
2              1998     9         21
2               1999    12          8
;
run;

%let window_size=3;
data temp;
 set have;
 by CoID;
 array x{&window_size} _temporary_;
 array y{&window_size} _temporary_;
 if first.CoID then do;n=0;call missing(of x{*} y{*});end;
 n+1;
 x{mod(n,&window_size)+1}=var1;
 y{mod(n,&window_size)+1}=var2;
 if n ge &window_size then do;
  do i=1 to &window_size;
   var1=x{i};
   var2=y{i};
   output;
  end;
 end;
run;
proc corr data=temp out=want noprint;
var var1 var2;
by CoID Year    ;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 721 views
  • 0 likes
  • 3 in conversation