BookmarkSubscribeRSS Feed
kain
Fluorite | Level 6

I would like to do regression for rolling windows of 5 years for a company. I used a brute force approach like the following:

data test;
 ws = 5;
 nwin = nrecs - ws +1;
 do w=1 to nwin;
 do p=w to w + ws -1;
 set test point=p nobs=nrecs;
 output;
 end;
 end;
 stop;
run;
data lib.test; set test; run;

proc reg data=test noprint outest=myests;
 by w;
 model y=x;
quit;
data lib.myests; set myests; run;

However, when I checked my data, I realized that if a cluster/group does not have enough observations of 5, it takes an observation of another company to make it 5. For example:

image (1).png

 The group w = 1 is fine. However, in the group w = 2, when A & A Foods LTD has only 4 observations, it takes an observation (the first one) of AAON Inc. I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered), like the case of w = 2 above. How may I do this?

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

Like this?

data TEST2;
 WS = 5;
 NWIN = NRECS - WS +1;
 do W=1 TO NWIN;
   C=COMPANY;
   do P=W TO W + WS -1;
     set TEST point=P nobs=NRECS;
     if COMPANY = C then output;
     else leave;
   end;
 end;
 stop;
run;

 

ChrisNZ
Tourmaline | Level 20

> I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered), 

Actually, my code only removes the rogue company, not the whole group.

You need to iterate twice to remove small groups: once to count, and once again to output.

 

Why do the first 2 records look identical?

 

 

mkeintz
PROC Star

If you have no instance of interior "holes" in a permno series, but sometimes have short series, then this will work.  And it will be far more efficient than using POINT= in a set statement, which will read each obs 5 times.  Below each obs is read once, stored in an array, and subsequently output five times, once fpr each window that contains it.

 

By the way, you say (emphasis mine):


@kain wrote:

I would like to do regression for rolling windows of 5 years for a company

but your regression, if the data were correctly created. would do one regression for ALL companies belonging to each five-year window.  

 

Which is it?  regression for EACH company in a window  (i.e. Nwindows * Ncompany regressions), or is it ALL companies in each window (Nwindows regressions)?

 

The code below creates the data you need:

 

data windows / view=windows;
  do until (last.permno);
    set test;
    by permno;

    /* Make an array for each variable in the regression  */
    /* Make lower and upper bounds to span all your series */
    array y_history {2001:2020} ;
    array x_history {2001:2020} ;

    if first.permno then min_year=year(date);
    y_history{year(date)}=y;
    x_history{year(date)}=x;
  end;

  max_year=year(date);

  if max_year>=min_year+4 then do w=min_year to max_year-4;
    do year=w to w+4;
      y=y_history{year};
      x=x_history{year};
      output;
    end;
  end;
run;

Now if it's one regression per window per company, then just use:

 

proc reg data=windows;
  by permno w;
  model y=x;
quit;

But if it's one regression covering ALL companies in a window, then

proc sort data=windows out=sorted_data;
  by w permno;
run;
proc reg data=sorted_data;
  by w;
  model y=x;
quit;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Whitepepper
Calcite | Level 5

Dear Kain,

 

Hope you are doing well. 

 

Could you please reply to me if this problem has been solved?

I have the same issue here and I try many time, the different companies will always interact with each other in the last observations for the former one. This issue is killing me 😞 

 

Best regards,

 

PaigeMiller
Diamond | Level 26

@Whitepepper this has been answered in your other thread.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Also, @Whitepepper this current thread has a solution from @mkeintz 

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Update

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2494 views
  • 1 like
  • 5 in conversation