BookmarkSubscribeRSS Feed
kain
Fluorite | Level 6

I would like to do regression for rolling windows of 5 years for a company. I used a brute force approach like the following:

data test;
 ws = 5;
 nwin = nrecs - ws +1;
 do w=1 to nwin;
 do p=w to w + ws -1;
 set test point=p nobs=nrecs;
 output;
 end;
 end;
 stop;
run;
data lib.test; set test; run;

proc reg data=test noprint outest=myests;
 by w;
 model y=x;
quit;
data lib.myests; set myests; run;

However, when I checked my data, I realized that if a cluster/group does not have enough observations of 5, it takes an observation of another company to make it 5. For example:

image (1).png

 The group w = 1 is fine. However, in the group w = 2, when A & A Foods LTD has only 4 observations, it takes an observation (the first one) of AAON Inc. I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered), like the case of w = 2 above. How may I do this?

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

Like this?

data TEST2;
 WS = 5;
 NWIN = NRECS - WS +1;
 do W=1 TO NWIN;
   C=COMPANY;
   do P=W TO W + WS -1;
     set TEST point=P nobs=NRECS;
     if COMPANY = C then output;
     else leave;
   end;
 end;
 stop;
run;

 

ChrisNZ
Tourmaline | Level 20

> I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered), 

Actually, my code only removes the rogue company, not the whole group.

You need to iterate twice to remove small groups: once to count, and once again to output.

 

Why do the first 2 records look identical?

 

 

mkeintz
PROC Star

If you have no instance of interior "holes" in a permno series, but sometimes have short series, then this will work.  And it will be far more efficient than using POINT= in a set statement, which will read each obs 5 times.  Below each obs is read once, stored in an array, and subsequently output five times, once fpr each window that contains it.

 

By the way, you say (emphasis mine):


@kain wrote:

I would like to do regression for rolling windows of 5 years for a company

but your regression, if the data were correctly created. would do one regression for ALL companies belonging to each five-year window.  

 

Which is it?  regression for EACH company in a window  (i.e. Nwindows * Ncompany regressions), or is it ALL companies in each window (Nwindows regressions)?

 

The code below creates the data you need:

 

data windows / view=windows;
  do until (last.permno);
    set test;
    by permno;

    /* Make an array for each variable in the regression  */
    /* Make lower and upper bounds to span all your series */
    array y_history {2001:2020} ;
    array x_history {2001:2020} ;

    if first.permno then min_year=year(date);
    y_history{year(date)}=y;
    x_history{year(date)}=x;
  end;

  max_year=year(date);

  if max_year>=min_year+4 then do w=min_year to max_year-4;
    do year=w to w+4;
      y=y_history{year};
      x=x_history{year};
      output;
    end;
  end;
run;

Now if it's one regression per window per company, then just use:

 

proc reg data=windows;
  by permno w;
  model y=x;
quit;

But if it's one regression covering ALL companies in a window, then

proc sort data=windows out=sorted_data;
  by w permno;
run;
proc reg data=sorted_data;
  by w;
  model y=x;
quit;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Whitepepper
Calcite | Level 5

Dear Kain,

 

Hope you are doing well. 

 

Could you please reply to me if this problem has been solved?

I have the same issue here and I try many time, the different companies will always interact with each other in the last observations for the former one. This issue is killing me 😞 

 

Best regards,

 

PaigeMiller
Diamond | Level 26

@Whitepepper this has been answered in your other thread.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Also, @Whitepepper this current thread has a solution from @mkeintz 

--
Paige Miller

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2063 views
  • 1 like
  • 5 in conversation