Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Correlations using the previous three-year observations

Reply
Occasional Contributor
Posts: 10

Correlations using the previous three-year observations

I have data as followed:

 

Co. ID    Year    VAR1    VAR2

1             1996     a            b

1              1997    c            d

1              1998    e            f

1              1999    g           h

2               1996    i            j

2              1997     k           l

2              1998     m         n

2               1999    o          p

 

I want to calculate the correlation between var1 and var2 using the previous three-year observations for the company and output to a file, e.g., for year 1998 company 1, I want to use company 1 year 1996-1998 observations of var1 and var2 to calculate the correlation between var1 and var2. I know I can use proc corr like this:

proc sort data=have;

by co. ID year;

run;

proc corr data=have out=want noprint;

var var1 var2;

by co.ID;

run;

 

But this will give me the correlation using all the available observations for each company. Please help!!!

 

Trusted Advisor
Posts: 1,115

Re: Correlations using the previous three-year observations

You could calculate the correlations in a data step. This doesn't look very elegant, but is fairly straightforward.

 

/* Create test data */
data have;
input CoID Year VAR1 VAR2;
cards;
1 1996 10 10
1 1997 15 11
1 1998 20 12
1 1999 25 11
2 1996 23 25
2 1997 24 24
2 1998 25 23
2 1999 80 23
3 1996 40 30
3 1998 45 31
3 1999 46 36
;

/* Compute correlations for 3 consecutive years */
data want;
set have;
by CoID year;
if first.CoID then y=1;
else y+1;
lag11=lag1(var1);
lag21=lag2(var1);
lag12=lag1(var2);
lag22=lag2(var2);
if y>=3 & lag(year)=year-1 & lag2(year)=year-2 & n(of lag:, var1, var2)=6 then do;
  mean1=mean(var1, lag11, lag21);
  mean2=mean(var2, lag12, lag22);
  std1=std(var1, lag11, lag21);
  std2=std(var2, lag12, lag22);
  if std1 & std2 then corr3y=(( var1-mean1)*( var2-mean2)
                             +(lag11-mean1)*(lag12-mean2)
                             +(lag21-mean1)*(lag22-mean2))/2
                             /(std1*std2);
end;
drop y lag: mean: std:;
run;

proc print data=want;
run;

 

Please note that the above code computes the correlations only if the three years are consecutive and all six values involved are non-missing. Feel free to weaken these conditions (and adapt the formula accordingly) if you think this is too restrictive.

Super User
Posts: 9,687

Re: Correlations using the previous three-year observations

[ Edited ]

Assuming there is no gap between two obs.

 

 

data have;
input CoID    Year    VAR1    VAR2 ;
cards;
1             1996     2            5
1              1997    4            8
1              1998    21            32
1              1999    42           6
2               1996    7            4
2              1997     8           14
2              1998     9         21
2               1999    12          8
;
run;

%let window_size=3;
data temp;
 set have;
 by CoID;
 array x{&window_size} _temporary_;
 array y{&window_size} _temporary_;
 if first.CoID then do;n=0;call missing(of x{*} y{*});end;
 n+1;
 x{mod(n,&window_size)+1}=var1;
 y{mod(n,&window_size)+1}=var2;
 if n ge &window_size then do;
  do i=1 to &window_size;
   var1=x{i};
   var2=y{i};
   output;
  end;
 end;
run;
proc corr data=temp out=want noprint;
var var1 var2;
by CoID Year    ;
run;
Ask a Question
Discussion stats
  • 2 replies
  • 199 views
  • 0 likes
  • 3 in conversation