I would like to do regression for rolling windows of 5 years for a company. I used a brute force approach like the following:
data test;
 ws = 5;
 nwin = nrecs - ws +1;
 do w=1 to nwin;
 do p=w to w + ws -1;
 set test point=p nobs=nrecs;
 output;
 end;
 end;
 stop;
run;
data lib.test; set test; run;
proc reg data=test noprint outest=myests;
 by w;
 model y=x;
quit;
data lib.myests; set myests; run;However, when I checked my data, I realized that if a cluster/group does not have enough observations of 5, it takes an observation of another company to make it 5. For example:
The group w = 1 is fine. However, in the group w = 2, when A & A Foods LTD has only 4 observations, it takes an observation (the first one) of AAON Inc. I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered), like the case of w = 2 above. How may I do this?
Like this?
data TEST2;
 WS = 5;
 NWIN = NRECS - WS +1;
 do W=1 TO NWIN;
   C=COMPANY;
   do P=W TO W + WS -1;
     set TEST point=P nobs=NRECS;
     if COMPANY = C then output;
     else leave;
   end;
 end;
 stop;
run;
> I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered),
Actually, my code only removes the rogue company, not the whole group.
You need to iterate twice to remove small groups: once to count, and once again to output.
Why do the first 2 records look identical?
If you have no instance of interior "holes" in a permno series, but sometimes have short series, then this will work. And it will be far more efficient than using POINT= in a set statement, which will read each obs 5 times. Below each obs is read once, stored in an array, and subsequently output five times, once fpr each window that contains it.
By the way, you say (emphasis mine):
@kain wrote:
I would like to do regression for rolling windows of 5 years for a company.
but your regression, if the data were correctly created. would do one regression for ALL companies belonging to each five-year window.
Which is it? regression for EACH company in a window (i.e. Nwindows * Ncompany regressions), or is it ALL companies in each window (Nwindows regressions)?
The code below creates the data you need:
data windows / view=windows;
  do until (last.permno);
    set test;
    by permno;
    /* Make an array for each variable in the regression  */
    /* Make lower and upper bounds to span all your series */
    array y_history {2001:2020} ;
    array x_history {2001:2020} ;
    if first.permno then min_year=year(date);
    y_history{year(date)}=y;
    x_history{year(date)}=x;
  end;
  max_year=year(date);
  if max_year>=min_year+4 then do w=min_year to max_year-4;
    do year=w to w+4;
      y=y_history{year};
      x=x_history{year};
      output;
    end;
  end;
run;
Now if it's one regression per window per company, then just use:
proc reg data=windows;
  by permno w;
  model y=x;
quit;But if it's one regression covering ALL companies in a window, then
proc sort data=windows out=sorted_data;
  by w permno;
run;
proc reg data=sorted_data;
  by w;
  model y=x;
quit;
Dear Kain,
Hope you are doing well.
Could you please reply to me if this problem has been solved?
I have the same issue here and I try many time, the different companies will always interact with each other in the last observations for the former one. This issue is killing me 😞
Best regards,
@Whitepepper this has been answered in your other thread.
Also, @Whitepepper this current thread has a solution from @mkeintz
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
