I would like to do regression for rolling windows of 5 years for a company. I used a brute force approach like the following:
data test;
ws = 5;
nwin = nrecs - ws +1;
do w=1 to nwin;
do p=w to w + ws -1;
set test point=p nobs=nrecs;
output;
end;
end;
stop;
run;
data lib.test; set test; run;
proc reg data=test noprint outest=myests;
by w;
model y=x;
quit;
data lib.myests; set myests; run;
However, when I checked my data, I realized that if a cluster/group does not have enough observations of 5, it takes an observation of another company to make it 5. For example:
The group w = 1 is fine. However, in the group w = 2, when A & A Foods LTD has only 4 observations, it takes an observation (the first one) of AAON Inc. I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered), like the case of w = 2 above. How may I do this?
Like this?
data TEST2;
WS = 5;
NWIN = NRECS - WS +1;
do W=1 TO NWIN;
C=COMPANY;
do P=W TO W + WS -1;
set TEST point=P nobs=NRECS;
if COMPANY = C then output;
else leave;
end;
end;
stop;
run;
> I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered),
Actually, my code only removes the rogue company, not the whole group.
You need to iterate twice to remove small groups: once to count, and once again to output.
Why do the first 2 records look identical?
If you have no instance of interior "holes" in a permno series, but sometimes have short series, then this will work. And it will be far more efficient than using POINT= in a set statement, which will read each obs 5 times. Below each obs is read once, stored in an array, and subsequently output five times, once fpr each window that contains it.
By the way, you say (emphasis mine):
@kain wrote:
I would like to do regression for rolling windows of 5 years for a company.
but your regression, if the data were correctly created. would do one regression for ALL companies belonging to each five-year window.
Which is it? regression for EACH company in a window (i.e. Nwindows * Ncompany regressions), or is it ALL companies in each window (Nwindows regressions)?
The code below creates the data you need:
data windows / view=windows;
do until (last.permno);
set test;
by permno;
/* Make an array for each variable in the regression */
/* Make lower and upper bounds to span all your series */
array y_history {2001:2020} ;
array x_history {2001:2020} ;
if first.permno then min_year=year(date);
y_history{year(date)}=y;
x_history{year(date)}=x;
end;
max_year=year(date);
if max_year>=min_year+4 then do w=min_year to max_year-4;
do year=w to w+4;
y=y_history{year};
x=x_history{year};
output;
end;
end;
run;
Now if it's one regression per window per company, then just use:
proc reg data=windows;
by permno w;
model y=x;
quit;
But if it's one regression covering ALL companies in a window, then
proc sort data=windows out=sorted_data;
by w permno;
run;
proc reg data=sorted_data;
by w;
model y=x;
quit;
Dear Kain,
Hope you are doing well.
Could you please reply to me if this problem has been solved?
I have the same issue here and I try many time, the different companies will always interact with each other in the last observations for the former one. This issue is killing me 😞
Best regards,
@Whitepepper this has been answered in your other thread.
Also, @Whitepepper this current thread has a solution from @mkeintz
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.