BookmarkSubscribeRSS Feed
Saba1
Quartz | Level 8

Hi 

 

I have a data set with monthly returns for about 4000 different firms. I want to create a new column with "cumulative past returns from month t – 12 to t − 2" for each firm separately, excluding the most recent one i.e. "t − 1". Here firm is identified by PERMNO.

 

Please guide me how to take cumulative lag (t-12, t-2) by firm? 

 

See a sample of my data set below. Thanks.

Data have;
input Date: mmddyy10. PERMNO Ret;
format Date mmddyy10.;
datalines;
01/31/2001    21227    -0.03
02/28/2001    21227    -0.06
03/31/2001    21227    -0.04
04/30/2001    21227    0.04
05/31/2001    21227    0.12
06/30/2001    21227    -0.08
07/31/2001    21227    0.12
08/31/2001    21227    0.02
09/30/2001    21227    0.11
10/31/2001    21227    0.17
11/30/2001    21227    -0.11
12/31/2001    21227    -0.04
01/31/2002    21227    -0.2
02/28/2002    21227    0.11
03/31/2002    21227    -0.06
04/30/2002    21227    -0.03
06/30/2005    22100    -0.01
07/31/2005    22100    0.02
08/31/2005    22100    0.09
09/30/2005    22100    0.006
10/31/2005    22100    -0.07
11/30/2005    22100    0.06
12/31/2005    22100    0.09
01/31/2006    22100    0.07
02/28/2006    22100    -0.15
03/31/2006    22100    0.05
04/30/2006    22100    0.08
05/31/2006    22100    -0.01
06/30/2006    22100    -0.02
07/31/2006    22100    -0.14
08/31/2006    22100    -0.10
09/30/2006    22100    0.007
10/31/2006    22100    0.05
11/30/2006    22100    0.06
12/31/2006    22100    0.12
run;
11 REPLIES 11
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7
could not under stand "month t – 12 to t − 2" can you be more clear
Saba1
Quartz | Level 8

@RM6 To run a regression, I need to control for past returns. One of the controls is "past year return" and literature uses "lag t-12, lag t-2" for this variable. This means "cumulative return of past 11 months excluding the most recent month".  Hope it clarifies.

PeterClemmensen
Tourmaline | Level 20

Two questions:

 

Why do you want Cumulative past monthly returns from time t-12 to t-2 at time t? Seems counter intuitive to me?

 

Do you have SAS/ETS? If so, PROC EXPAND might be worth looking into 🙂

Saba1
Quartz | Level 8

@PeterClemmensen

1- I need to measure "past year return" and literature uses "lag t-12, lag t-2" for this variable. This means "cumulative return of past 11 months excluding the most recent month". Cumulative t-12,t-2 is the sum of lag-returns from lag 2 to lag 12.

 

2- Yes, I do have SAS/ETS.

user24feb
Barite | Level 11

Data Want (Drop = MyCumSum_dummy);
  Retain MyCumSum_dummy Nr;
  Set have;
  By PERMNO;
  If First.PERMNO Then Nr = 0;
  Nr + 1;
  MyCumSum_dummy = Sum(MyCumSum_dummy, Lag2(Ret), IfN(Nr >= 12, -Lag13(Ret), 0));
  MyCumSum = IfN(Nr >= 12, MyCumSum_dummy, .);
Run;

* OR;
Proc TimeData Data = have
              Out = _NULL_
			  OutArray = want2;
  By PERMNO;
  ID Date Interval = MONTH;
  Var Ret;
  OutArray MyCumSum;
  Do t = 1 To _LENGTH_;
    MyCumSum[t] = Ret[t-12]+Ret[t-11]+Ret[t-10]+Ret[t-9]+
                  Ret[t-8]+Ret[t-7]+Ret[t-6]+Ret[t-5]+
                  Ret[t-4]+Ret[t-3]+Ret[t-2];
  End;
Run;
Peter_C
Rhodochrosite | Level 12
@Saba1
What is your formula for cululative return?
Saba1
Quartz | Level 8

@Peter_C I need to use something like 

Ret[t-12]+Ret[t-11]+Ret[t-10]+Ret[t-9]+
                  Ret[t-8]+Ret[t-7]+Ret[t-6]+Ret[t-5]+
                  Ret[t-4]+Ret[t-3]+Ret[t-2]

where "t-n" are lags. But the problem is that I need to keep rolling it for each month "t" by PERMNO.

s_lassen
Meteorite | Level 14

When doing rolling sums/means, the easiest is often to put the values in a temporary array. Only here we have to use LAG because we do not want the newest values in:

data want;
  array values (0:10) 8 _temporary_;
  call missing(of values(*));
  do _N_=0 by 1 until(last.permno);
    set have;
    by permno;
    sum=sum(of values(*));
    output;
    values(mod(_N_,11))=ifn(_N_=0,.,lag(Ret)); /* use IFN so that LAG is always called */
    end;
run;

If you do not want the sum before you have all 11 values, use "if _N_>=10 then sum=sum(of values(*));"

Peter_C
Rhodochrosite | Level 12
Inside a DO loop like that, a sneeky trick to work with lags without using the current value is to execute the SET after the calculation (not before)
Ksharp
Super User

If I understand what you mean.

 

Data have;
input Date: mmddyy10. PERMNO Ret;
format Date mmddyy10.;
datalines;
01/31/2001    21227    -0.03
02/28/2001    21227    -0.06
03/31/2001    21227    -0.04
04/30/2001    21227    0.04
05/31/2001    21227    0.12
06/30/2001    21227    -0.08
07/31/2001    21227    0.12
08/31/2001    21227    0.02
09/30/2001    21227    0.11
10/31/2001    21227    0.17
11/30/2001    21227    -0.11
12/31/2001    21227    -0.04
01/31/2002    21227    -0.2
02/28/2002    21227    0.11
03/31/2002    21227    -0.06
04/30/2002    21227    -0.03
06/30/2005    22100    -0.01
07/31/2005    22100    0.02
08/31/2005    22100    0.09
09/30/2005    22100    0.006
10/31/2005    22100    -0.07
11/30/2005    22100    0.06
12/31/2005    22100    0.09
01/31/2006    22100    0.07
02/28/2006    22100    -0.15
03/31/2006    22100    0.05
04/30/2006    22100    0.08
05/31/2006    22100    -0.01
06/30/2006    22100    -0.02
07/31/2006    22100    -0.14
08/31/2006    22100    -0.10
09/30/2006    22100    0.007
10/31/2006    22100    0.05
11/30/2006    22100    0.06
12/31/2006    22100    0.12 
;
run;
proc sql;
create table want as
 select *,(select sum(ret) from have where PERMNO=a.PERMNO and
 Date between intnx('month',a.date,-12,'e') and intnx('month',a.date,-2,'e')) as sum
  from have as a;
quit;
mkeintz
PROC Star

For each t, you want rolling window returns,  where the window size is 11 months, and the window end-point is t-2.   Keep a rolling window of the sum of the log(ret+1) for lagged returns,,, and then just exp that sum to get cumulative returns:

 

Data have;
input Date: mmddyy10. PERMNO Ret;
format Date mmddyy10.;
datalines;
01/31/2001    21227    -0.03
02/28/2001    21227    -0.06
03/31/2001    21227    -0.04
04/30/2001    21227    0.04
05/31/2001    21227    0.12
06/30/2001    21227    -0.08
07/31/2001    21227    0.12
08/31/2001    21227    0.02
09/30/2001    21227    0.11
10/31/2001    21227    0.17
11/30/2001    21227    -0.11
12/31/2001    21227    -0.04
01/31/2002    21227    -0.2
02/28/2002    21227    0.11
03/31/2002    21227    -0.06
04/30/2002    21227    -0.03
06/30/2005    22100    -0.01
07/31/2005    22100    0.02
08/31/2005    22100    0.09
09/30/2005    22100    0.006
10/31/2005    22100    -0.07
11/30/2005    22100    0.06
12/31/2005    22100    0.09
01/31/2006    22100    0.07
02/28/2006    22100    -0.15
03/31/2006    22100    0.05
04/30/2006    22100    0.08
05/31/2006    22100    -0.01
06/30/2006    22100    -0.02
07/31/2006    22100    -0.14
08/31/2006    22100    -0.10
09/30/2006    22100    0.007
10/31/2006    22100    0.05
11/30/2006    22100    0.06
12/31/2006    22100    0.12
run;
data want (drop=_:);
  set have ;
  by permno;
  _month+1;
  if first.permno then _month=1;

  _logret=log(1+ret);
  _sumlog+lag2(_logret);
  _sumlog=_sumlog -ifn(_n_>12,0,lag12(_logret));

  if _month>12 then window_12_2_ret=exp(_sumlog)-1;
run;

Notes:

 

  1. The _sumlog=_sumlog-ifn(_n_>12,0,lag12(_logret) statement uses the IFN function, which unconditional updates the lag12(_logret) queue, but conditionally  returns it value.  Speciificlly the value is returned only starting at the 13th overall observation in dataset HAVE. 
  2. Use _month to determine how advanced within each pemno a record is.  Note the variable WINDOW_12_2_RET is only set when the rolling windows are completely populated.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1443 views
  • 2 likes
  • 8 in conversation