BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bera00
Obsidian | Level 7

Hello all,

I need your help please because I am having some troubles with the sas script to estimate rolling window betas .

I want to estimate for each cusip code  in each month , the  betai and betam using the previous four years of monthly returns data:

Rt = alpha + betai Rti  + betam Rtm +  e

 

I have 20000 stocks in my dataset for a period of 2001-2016.  The starting period of my study is 01/01/2005 so I would like to  estimate the betas using the previous 48 months date.  The problem is that not all the stocks have previous complete 48 month data returns. I have trying the following code but it doesn't work well. I would like to know your suggestions please.  my dataset that looks like that:

CUSIP Rt  Rtm Rti Report_date

and I would like to obtain

 

REPORT_DATE CUSIP betai betam

 

 Thank you in advance

 

data nstockregfinal5  / view= nstockregfinal5;
 array _X1 {48} _temporary_ ;
 array _X2 {48} _temporary_ ;
 array _Y {48} _temporary_ ;
 set price.nstockregfinal3;
 by  CUSIP;
 retain N 0;
 N = ifn(first.CUSIP,1,N+1);
 I=mod(N-1,48)+1;
 _X1{I}=Rtm;
 _X2{I}=Rti;
 _Y{I}=Rt;
 if N>=48 then do I= 1 to 48;
 Rtm=_X1{I};
 Rti=_X2{I};
 Rt=_Y{I};
 output;
 end;
run;
proc reg data= nstockregfinal5  noprint outest=myests;
 by CUSIP REPORT_DATE;
 model Rt = Rtm Rti;
 run;
 quit;

  

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

As to the uniintialized variable I, that's becuase I used I as the index of arrays in the second loop, even though the loop index was M.  Replace the {I} occurence with {M}.

 

So now you want a minumum window size criterion.  In that case I suggest you add anew array: _SEQ_N{}, of size 192 in your case.  For each month index M, it will have the "rank" of the corresponding data record.  For instance if your first 4 observations are:

 

    1/2001 with Y=101
    2/2001  y=101.5

    4/2001  y=99.3

    5/2000  y=99.4

 

Then _Y{1}=101, _Y{2}=101.5, _Y{3}=., _Y{4}=99.3, and _Y{5}=99.4.   Note _Y{3} is a missing value

 

But the new array will have values

   _SEQ_N{1}=1   _SEQ_N{2}=2   _SEQ_N{3}=.   _SEQ_N{4}=3   _SEQ_N{5}=4

 

Just as _Y{3}=., so does _SEQ_N{3}.   BUT ... _SEQ_N{4}=3, because it is the 3rd non-missing value in the series.  Later on when you get to months 48, 49, 50, 51, etc. you can just subtract the pertinent _SEQ_N values to get the number of non-missing months available in the 48-month range.  Here's the modified program.  Note the new macrovar MIN_MCOUNT, which provides the minimum acceptable count of months in your window (36 in this example):

 

 

%let time_range=%eval(12*(1+2016-2001));
%put &=time_range;
%let min_mcount=36;

data nstockregfinal5  / view= nstockregfinal5;
  array _X1 {&time_range} _temporary_ ;
  array _X2 {&time_range} _temporary_ ;
  array _Y {&time_range} _temporary_ ;
  array _seq_N {&time_range} _temporary_;

  set price.nstockregfinal3;
  by  CUSIP;

  retain first_date ;
  if first.cusip then do;
    call missing(of _x1{*},of _x2{*},of _y{*},of _seq_N{*});
    first_date=report_date;
    _seq=0;
  end;

  monthnum=1+intck('month',first_date,report_date);
  _seq+1;  /*Increment valid record count */

  _X1{monthnum}=Rtm;
  _X2{monthnum}=Rti;
  _Y{monthnum}=Rt;
  _seq_N{monthnum}=_seq;

  _mcount=.;   /* Initialize valid count of months for this window*/
  if monthnum>=48 then do;
    /* Find the earliest non-missing _Y */
    do m=monthnum-47 to monthnum while (_Y{m}=.);
    end;
    /* Now get count of non-missing months in this 48-month window*/
    _mcount=1+_seq_N{monthnum}-_seq_N{m};

    if _mcount>= &min_mcount then do m=monthnum-47 to monthnum;
      if _Y{m}=. then continue;
      Rtm=_X1{m}; 
      Rti=_X2{m};
      Rt=_Y{m};
      output;
    end;
  end;
run;

proc reg data= nstockregfinal5  noprint outest=myests;
  by CUSIP REPORT_DATE;
  model Rt = Rtm Rti;
  run;
quit;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

You need to check whether the most-recent 48 records comprise a complete 48-month window.  You can use the lag function for this purpose.

 

Instead of incrementing N by 1 for each incoming record, calculate MONTHNUM=1+number of months since first report_date for the given CUSIP.  Whereas the series N values would never have "holes", the MONTHNUM series would have holes in the case of a missing month.   You can still calculate I=mod(monthnum-1,48)+1 to specify the array element. 

 

But now, only if current monthnum=47+lag47(monthnum) will you have a complete 48-month windows.  And of course lag47(cusip)=cusip to make sure you are not combining series from two cusips.

 

I modified your program to use this test.  The code in italics is new.  The code in strikeout font is your code that should be removed in this version.

 

data nstockregfinal5 / view= nstockregfinal5;

  array _X1 {48} _temporary_ ;

  array _X2 {48} _temporary_ ;

  array _Y {48} _temporary_ ;

  set price.nstockregfinal3;

  by CUSIP;

  retain N 0;

  N = ifn(first.CUSIP,1,N+1);

  I=mod(N-1,48)+1;

  retain first_date;

  if first.cusip then first_date=report_date;

  monthnum=1+intck('month',first_date,report_date);

  I=mod(monthnum-1,48)+1;

  _X1{I}=Rtm;

  _X2{I}=Rti;

  _Y{I}=Rt;

  OKwindow = (lag47(month)+47=month) and (lag47(cusip)=cusip);

  if N>=48 then do I= 1 to 48;

  if OKwindow then do I=1 to 48;

    Rtm=_x1{I};

    Rti=_x2{I};

    Rt=_Y{I};

    output;

  end;

run;

proc reg data= nstockregfinal5 noprint outest=myests;

  by CUSIP REPORT_DATE;

  model Rt = Rtm Rti;

  run;

quit;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bera00
Obsidian | Level 7

Thank you so much for your answer.  If i undestood well, this code  keeps only the stocks that have previous 48 month records?

However in my study I need to keep all the stocks and for those that don't have 48 then take the available records.. is that possible?

I have been trying this script but it works only for one cusip not for many and it doesn t give me what I want : beta coefficients for each cusip at each date.

data nstockregfinal4 / view= nstockregfinal4;
do grp = 0 to nrecs-48;
do j = 1 + grp to 48 + grp;
set nstockregfinal3 nobs=nrecs point=j;
output;
end;
end;
stop;
run;

proc reg data=nstockregfinal4 outest=stats noprint;
by grp;
model Rt = Rtm Rti;
run;
quit;
mkeintz
PROC Star

OK, so now it is clear you want

  1. Every Rolling window whose report_date is >= first_date+47 months
  2. Any size window up to 48 months  (no minimum size?)

Then I'd suggest not bothering with arrays of size 48, with all the attendant MOD function usage.  Instead have an array of size 192 (12 months for years 2001-2016).  This will allow direct indexing of the arrays on MONTHNUM (= 1 + n of months since firstdate).  Easier to store in the array and easier to extract.

 

%let time_range=%eval(12*(1+2016-2001));
%put &=time_range;

data nstockregfinal5  / view= nstockregfinal5;
  array _X1 {&time_range} _temporary_ ;
  array _X2 {&time_range} _temporary_ ;
  array _Y {&time_range} _temporary_ ;

  set price.nstockregfinal3;
  by  CUSIP;

  retain first_date;
  if first.cusip then do;
    call missing(of _x1{*},of _x2{*},of _y{*});
    first_date=report_date;
  end;

  monthnum=1+intck('month',first_date,report_date);

  _X1{monthnum}=Rtm;
  _X2{monthnum}=Rti;
  _Y{monthnum}=Rt;

  if monthnum>=48 then do m=monthnum-47 to monthnum;
    if _Y{m}=. then continue;
    Rtm=_X1{m};   /*editted correction - use M instead of I for index*/
    Rti=_X2{m};
    Rt=_Y{m};
    output;
  end;
run;

proc reg data= nstockregfinal5  noprint outest=myests;
  by CUSIP REPORT_DATE;
  model Rt = Rtm Rti;
  run;
quit;

 

Note the "if _Y{M}=. then continue" statement says to skip the M'th iteration of the loop if the M'th observation of _Y is missing.  That's the way to prevent outputting missing months.
   

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bera00
Obsidian | Level 7

Hello

thank you again for your answer . If I would like to set a minimum of obs then how can I do it? In addition

I am getting in the log window :


NOTE: Variable I is uninitialized.
ERROR: Array subscript out of range at line 46 column 9.
Cusip=000360206 Rt=10.433725466 Rtm=-3.146277904 Rti=-1.741508366 REPORT_DATE=04/30/2005 FIRST.Cusip=0 LAST.Cusip=0
first_date=15126 monthnum=48 m=1 I=. _ERROR_=1 _N_=68

Thank you

mkeintz
PROC Star

As to the uniintialized variable I, that's becuase I used I as the index of arrays in the second loop, even though the loop index was M.  Replace the {I} occurence with {M}.

 

So now you want a minumum window size criterion.  In that case I suggest you add anew array: _SEQ_N{}, of size 192 in your case.  For each month index M, it will have the "rank" of the corresponding data record.  For instance if your first 4 observations are:

 

    1/2001 with Y=101
    2/2001  y=101.5

    4/2001  y=99.3

    5/2000  y=99.4

 

Then _Y{1}=101, _Y{2}=101.5, _Y{3}=., _Y{4}=99.3, and _Y{5}=99.4.   Note _Y{3} is a missing value

 

But the new array will have values

   _SEQ_N{1}=1   _SEQ_N{2}=2   _SEQ_N{3}=.   _SEQ_N{4}=3   _SEQ_N{5}=4

 

Just as _Y{3}=., so does _SEQ_N{3}.   BUT ... _SEQ_N{4}=3, because it is the 3rd non-missing value in the series.  Later on when you get to months 48, 49, 50, 51, etc. you can just subtract the pertinent _SEQ_N values to get the number of non-missing months available in the 48-month range.  Here's the modified program.  Note the new macrovar MIN_MCOUNT, which provides the minimum acceptable count of months in your window (36 in this example):

 

 

%let time_range=%eval(12*(1+2016-2001));
%put &=time_range;
%let min_mcount=36;

data nstockregfinal5  / view= nstockregfinal5;
  array _X1 {&time_range} _temporary_ ;
  array _X2 {&time_range} _temporary_ ;
  array _Y {&time_range} _temporary_ ;
  array _seq_N {&time_range} _temporary_;

  set price.nstockregfinal3;
  by  CUSIP;

  retain first_date ;
  if first.cusip then do;
    call missing(of _x1{*},of _x2{*},of _y{*},of _seq_N{*});
    first_date=report_date;
    _seq=0;
  end;

  monthnum=1+intck('month',first_date,report_date);
  _seq+1;  /*Increment valid record count */

  _X1{monthnum}=Rtm;
  _X2{monthnum}=Rti;
  _Y{monthnum}=Rt;
  _seq_N{monthnum}=_seq;

  _mcount=.;   /* Initialize valid count of months for this window*/
  if monthnum>=48 then do;
    /* Find the earliest non-missing _Y */
    do m=monthnum-47 to monthnum while (_Y{m}=.);
    end;
    /* Now get count of non-missing months in this 48-month window*/
    _mcount=1+_seq_N{monthnum}-_seq_N{m};

    if _mcount>= &min_mcount then do m=monthnum-47 to monthnum;
      if _Y{m}=. then continue;
      Rtm=_X1{m}; 
      Rti=_X2{m};
      Rt=_Y{m};
      output;
    end;
  end;
run;

proc reg data= nstockregfinal5  noprint outest=myests;
  by CUSIP REPORT_DATE;
  model Rt = Rtm Rti;
  run;
quit;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2961 views
  • 2 likes
  • 2 in conversation