Re: Regressions for rolling window for each company

kain · Posted 01-17-2021 09:09 PM

I would like to do regression for rolling windows of 5 years for a company. I used a brute force approach like the following:

data test;
 ws = 5;
 nwin = nrecs - ws +1;
 do w=1 to nwin;
 do p=w to w + ws -1;
 set test point=p nobs=nrecs;
 output;
 end;
 end;
 stop;
run;
data lib.test; set test; run;

proc reg data=test noprint outest=myests;
 by w;
 model y=x;
quit;
data lib.myests; set myests; run;

However, when I checked my data, I realized that if a cluster/group does not have enough observations of 5, it takes an observation of another company to make it 5. For example:

The group w = 1 is fine. However, in the group w = 2, when A & A Foods LTD has only 4 observations, it takes an observation (the first one) of AAON Inc. I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered), like the case of w = 2 above. How may I do this?

ChrisNZ · Posted 01-17-2021 11:09 PM

Like this?

data TEST2;
 WS = 5;
 NWIN = NRECS - WS +1;
 do W=1 TO NWIN;
   C=COMPANY;
   do P=W TO W + WS -1;
     set TEST point=P nobs=NRECS;
     if COMPANY = C then output;
     else leave;
   end;
 end;
 stop;
run;

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 01-17-2021 11:23 PM

> I would like something like, if a cluster/group has fewer than 5 observations, then it should be removed (not considered),

Actually, my code only removes the rogue company, not the whole group.

You need to iterate twice to remove small groups: once to count, and once again to output.

Why do the first 2 records look identical?

High-Performance SAS Coding - Third Edition

mkeintz · Posted 01-17-2021 11:55 PM

If you have no instance of interior "holes" in a permno series, but sometimes have short series, then this will work. And it will be far more efficient than using POINT= in a set statement, which will read each obs 5 times. Below each obs is read once, stored in an array, and subsequently output five times, once fpr each window that contains it.

By the way, you say (emphasis mine):

@kain wrote:

I would like to do regression for rolling windows of 5 years for a company.

but your regression, if the data were correctly created. would do one regression for ALL companies belonging to each five-year window.

Which is it? regression for EACH company in a window (i.e. Nwindows * Ncompany regressions), or is it ALL companies in each window (Nwindows regressions)?

The code below creates the data you need:

data windows / view=windows;
  do until (last.permno);
    set test;
    by permno;

    /* Make an array for each variable in the regression  */
    /* Make lower and upper bounds to span all your series */
    array y_history {2001:2020} ;
    array x_history {2001:2020} ;

    if first.permno then min_year=year(date);
    y_history{year(date)}=y;
    x_history{year(date)}=x;
  end;

  max_year=year(date);

  if max_year>=min_year+4 then do w=min_year to max_year-4;
    do year=w to w+4;
      y=y_history{year};
      x=x_history{year};
      output;
    end;
  end;
run;

Now if it's one regression per window per company, then just use:

proc reg data=windows;
  by permno w;
  model y=x;
quit;

But if it's one regression covering ALL companies in a window, then

proc sort data=windows out=sorted_data;
  by w permno;
run;
proc reg data=sorted_data;
  by w;
  model y=x;
quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Whitepepper · Posted 06-06-2024 05:49 AM

Dear Kain,

Hope you are doing well.

Could you please reply to me if this problem has been solved?

I have the same issue here and I try many time, the different companies will always interact with each other in the last observations for the former one. This issue is killing me 😞

Best regards,

PaigeMiller · Posted 06-06-2024 07:07 AM

@Whitepepper this has been answered in your other thread.

--
Paige Miller

PaigeMiller · Posted 06-06-2024 07:31 AM

Also, @Whitepepper this current thread has a solution from @mkeintz

--
Paige Miller

Regressions for rolling window for each company