Hi
I have a data set with monthly returns for about 4000 different firms. I want to create a new column with "cumulative past returns from month t – 12 to t − 2" for each firm separately, excluding the most recent one i.e. "t − 1". Here firm is identified by PERMNO.
Please guide me how to take cumulative lag (t-12, t-2) by firm?
See a sample of my data set below. Thanks.
Data have;
input Date: mmddyy10. PERMNO Ret;
format Date mmddyy10.;
datalines;
01/31/2001 21227 -0.03
02/28/2001 21227 -0.06
03/31/2001 21227 -0.04
04/30/2001 21227 0.04
05/31/2001 21227 0.12
06/30/2001 21227 -0.08
07/31/2001 21227 0.12
08/31/2001 21227 0.02
09/30/2001 21227 0.11
10/31/2001 21227 0.17
11/30/2001 21227 -0.11
12/31/2001 21227 -0.04
01/31/2002 21227 -0.2
02/28/2002 21227 0.11
03/31/2002 21227 -0.06
04/30/2002 21227 -0.03
06/30/2005 22100 -0.01
07/31/2005 22100 0.02
08/31/2005 22100 0.09
09/30/2005 22100 0.006
10/31/2005 22100 -0.07
11/30/2005 22100 0.06
12/31/2005 22100 0.09
01/31/2006 22100 0.07
02/28/2006 22100 -0.15
03/31/2006 22100 0.05
04/30/2006 22100 0.08
05/31/2006 22100 -0.01
06/30/2006 22100 -0.02
07/31/2006 22100 -0.14
08/31/2006 22100 -0.10
09/30/2006 22100 0.007
10/31/2006 22100 0.05
11/30/2006 22100 0.06
12/31/2006 22100 0.12
run;
@RM6 To run a regression, I need to control for past returns. One of the controls is "past year return" and literature uses "lag t-12, lag t-2" for this variable. This means "cumulative return of past 11 months excluding the most recent month". Hope it clarifies.
Two questions:
Why do you want Cumulative past monthly returns from time t-12 to t-2 at time t? Seems counter intuitive to me?
Do you have SAS/ETS? If so, PROC EXPAND might be worth looking into 🙂
1- I need to measure "past year return" and literature uses "lag t-12, lag t-2" for this variable. This means "cumulative return of past 11 months excluding the most recent month". Cumulative t-12,t-2 is the sum of lag-returns from lag 2 to lag 12.
2- Yes, I do have SAS/ETS.
Data Want (Drop = MyCumSum_dummy);
Retain MyCumSum_dummy Nr;
Set have;
By PERMNO;
If First.PERMNO Then Nr = 0;
Nr + 1;
MyCumSum_dummy = Sum(MyCumSum_dummy, Lag2(Ret), IfN(Nr >= 12, -Lag13(Ret), 0));
MyCumSum = IfN(Nr >= 12, MyCumSum_dummy, .);
Run;
* OR;
Proc TimeData Data = have
Out = _NULL_
OutArray = want2;
By PERMNO;
ID Date Interval = MONTH;
Var Ret;
OutArray MyCumSum;
Do t = 1 To _LENGTH_;
MyCumSum[t] = Ret[t-12]+Ret[t-11]+Ret[t-10]+Ret[t-9]+
Ret[t-8]+Ret[t-7]+Ret[t-6]+Ret[t-5]+
Ret[t-4]+Ret[t-3]+Ret[t-2];
End;
Run;
@Peter_C I need to use something like
Ret[t-12]+Ret[t-11]+Ret[t-10]+Ret[t-9]+
Ret[t-8]+Ret[t-7]+Ret[t-6]+Ret[t-5]+
Ret[t-4]+Ret[t-3]+Ret[t-2]
where "t-n" are lags. But the problem is that I need to keep rolling it for each month "t" by PERMNO.
When doing rolling sums/means, the easiest is often to put the values in a temporary array. Only here we have to use LAG because we do not want the newest values in:
data want; array values (0:10) 8 _temporary_; call missing(of values(*)); do _N_=0 by 1 until(last.permno); set have; by permno; sum=sum(of values(*)); output; values(mod(_N_,11))=ifn(_N_=0,.,lag(Ret)); /* use IFN so that LAG is always called */ end; run;
If you do not want the sum before you have all 11 values, use "if _N_>=10 then sum=sum(of values(*));"
If I understand what you mean.
Data have;
input Date: mmddyy10. PERMNO Ret;
format Date mmddyy10.;
datalines;
01/31/2001 21227 -0.03
02/28/2001 21227 -0.06
03/31/2001 21227 -0.04
04/30/2001 21227 0.04
05/31/2001 21227 0.12
06/30/2001 21227 -0.08
07/31/2001 21227 0.12
08/31/2001 21227 0.02
09/30/2001 21227 0.11
10/31/2001 21227 0.17
11/30/2001 21227 -0.11
12/31/2001 21227 -0.04
01/31/2002 21227 -0.2
02/28/2002 21227 0.11
03/31/2002 21227 -0.06
04/30/2002 21227 -0.03
06/30/2005 22100 -0.01
07/31/2005 22100 0.02
08/31/2005 22100 0.09
09/30/2005 22100 0.006
10/31/2005 22100 -0.07
11/30/2005 22100 0.06
12/31/2005 22100 0.09
01/31/2006 22100 0.07
02/28/2006 22100 -0.15
03/31/2006 22100 0.05
04/30/2006 22100 0.08
05/31/2006 22100 -0.01
06/30/2006 22100 -0.02
07/31/2006 22100 -0.14
08/31/2006 22100 -0.10
09/30/2006 22100 0.007
10/31/2006 22100 0.05
11/30/2006 22100 0.06
12/31/2006 22100 0.12
;
run;
proc sql;
create table want as
select *,(select sum(ret) from have where PERMNO=a.PERMNO and
Date between intnx('month',a.date,-12,'e') and intnx('month',a.date,-2,'e')) as sum
from have as a;
quit;
For each t, you want rolling window returns, where the window size is 11 months, and the window end-point is t-2. Keep a rolling window of the sum of the log(ret+1) for lagged returns,,, and then just exp that sum to get cumulative returns:
Data have;
input Date: mmddyy10. PERMNO Ret;
format Date mmddyy10.;
datalines;
01/31/2001 21227 -0.03
02/28/2001 21227 -0.06
03/31/2001 21227 -0.04
04/30/2001 21227 0.04
05/31/2001 21227 0.12
06/30/2001 21227 -0.08
07/31/2001 21227 0.12
08/31/2001 21227 0.02
09/30/2001 21227 0.11
10/31/2001 21227 0.17
11/30/2001 21227 -0.11
12/31/2001 21227 -0.04
01/31/2002 21227 -0.2
02/28/2002 21227 0.11
03/31/2002 21227 -0.06
04/30/2002 21227 -0.03
06/30/2005 22100 -0.01
07/31/2005 22100 0.02
08/31/2005 22100 0.09
09/30/2005 22100 0.006
10/31/2005 22100 -0.07
11/30/2005 22100 0.06
12/31/2005 22100 0.09
01/31/2006 22100 0.07
02/28/2006 22100 -0.15
03/31/2006 22100 0.05
04/30/2006 22100 0.08
05/31/2006 22100 -0.01
06/30/2006 22100 -0.02
07/31/2006 22100 -0.14
08/31/2006 22100 -0.10
09/30/2006 22100 0.007
10/31/2006 22100 0.05
11/30/2006 22100 0.06
12/31/2006 22100 0.12
run;
data want (drop=_:);
set have ;
by permno;
_month+1;
if first.permno then _month=1;
_logret=log(1+ret);
_sumlog+lag2(_logret);
_sumlog=_sumlog -ifn(_n_>12,0,lag12(_logret));
if _month>12 then window_12_2_ret=exp(_sumlog)-1;
run;
Notes:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.