Contributor
Posts: 44

# help on rolling regressions needed

I need to do a rolling time-series regression in order to test my regression model. I found a suitable example related to this (link below). The idea is to make the monthly regression go in 5-year loops, iterating 1 year forward at a time. My regression is of following type: Identity= meanHML meanMOM. Dates I have in form 1990-07, 1990-08...until 2008-12. If someone could help me modify the code suitable, I would be happy.

Valued Guide
Posts: 653

## help on rolling regressions needed

I am not sure about your question, but I did notice that in your code above you have:

%if %lowcase(%substr(&date,1,4))= year %then %let year_date=1;      *Don't know if should put something here?;

%else %let year_date=0;

You may not place a comment or other statement between the %IF and its %ELSE.

Contributor
Posts: 44

## help on rolling regressions needed

Thanks for the hint. However, I decided to change to approach away from macros, since it was quite difficult for me to understand. I've edited the original question to a new form.

Contributor
Posts: 44

## Re: help on rolling regressions needed

Could someone help me out with this rolling regression issue? I've tried to modify the code found from sascommunity.org/wiki but unsuccesfully. Here's what I've wrote, I know it's not correct but at least a starting point..I have the data readily sorted by date (dataset tt). I hope someone would be able to modify the code to a more sensible diection. Thanks!!

%let nmonths=60;

data expanded(drop = nn);

length span \$ 13;

set tt;

do nn = 1 to &nmonths;

span = catx(   '-'

, put(intnx('month',199007,nn-&nmonths),yymm6.)

, put(intnx('month',200812,nn-1     ),yymm6.)

);

output;

end;

run;

*Re-order. ;

proc sort data=expanded;

by span date;

run;

*Finally, run all of the regressions independently in a single step;

.

proc reg data=expanded noprint outest=regout;

by span;

model Identity = meanHML meanMOM;

run;

Posts: 1,345

## Re: help on rolling regressions needed

You can establish a 60 row (60 month) array.  The first 60 months of your data go into this array, and then are written out to make data for the first 5-year window.  Then the next 12 months can be read in, overwriting rows 1 through 12 of the array (i.e. replacing year 1 data with year 6 data).  At that point write out the full array again, yielding data for the 2nd windows.  Read another 12 months into rows 13-24 (replacing year 2 data with year 7 data), and the write out the 3rd 60-month window.  Et Cetera.  No subsequent sorting is required, which (in combination with making a data VIEW instead of a data FILE) should make things faster.

data want (keep=id endyear endmo y x1 x2)/ view=want;

array vars {60,3}  /*60 rows for Y, x1 and x2 */;

do n=1 by 1 until (last.id);

set have (keep=id date y x1 x2);

by id;

row=mod(n-1,60)+1;   /* for N=1 .. 60 , row=1..60, then N=61==>row=1, N=62==>row=2 etc. */

vars{row,1}=y;  vars{row,2}=x1;  vars{row}=x{3};

if n>=60 and mod(n,12)=0 then do;  ** If we are at least at obs 60 and it is divisible by 12 ... **;

endyear=year(date);  endmo=month(date);

do row=1 to 60;

y=vars{I,1}; x1=vars{I,2} x2=vars{I,3};

output;

end;

end;

run;

proc reg data=want noproint outest=regout;

by id endyear endmo;

model y=x1 x2;

quit;

Now you might be concerned that while the data for the first window (year-year5) will have 60 rows in chronological order, not all windows will (for instance for the second window, the data will be ordered year6, year2 ... year5).  But PROC REG gives the same result no matter the order.

But if you do care, then, instead of the "do row=1 to 60" block, you can do this:

do J=N-59 to N;

row=mod(J-1,60)+1;

y=vars(row,1};  x1=vars(row,2};  x2=vars{row,3);

output;

end;

Now even though WANT is a data VIEW and not a data FILE, this can take a long time.  It would likely to be a good deal faster if you could use the DATA step to make rolling sum-of-squares-and-cross-products, which can be read by PROC REG.  Not only is less data passed from one sep to the next, but making a rolling SSCP in the data step ought to be faster.  I'm working on a paper to show this.  Hoping to present it at NESUG 2012

Discussion stats
• 4 replies
• 1646 views
• 3 likes
• 3 in conversation