- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have monthly returns data, Return, a date variable, Price_Date, and a firm identification variable, Firm_ID.
There are several years of returns data for multiple firms.
Some months are missing entirely (no values for any variables including the date).
Alternatively for some of the months that do exist (i.e. the date variable exists) the return variable and/or the firm identification variable have missing values (i.e. ".").
I want to calculate a new variable that is the rolling X-month (e.g. 3 months, 12 months) sum of the returns.
The new variable, Rolling_Sum, should equal the current month's return plus the previous (X-1) months returns (for a total of X months of returns).
However I also want to take into account missing observations such that when any of the returns in any particular month's rolling sum are missing then the rolling sum for that month should return a missing value also.
How can I achieve this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have SAS/ETS?
data time1 ;
input date yymmdd10. return ;
format date yymm7.;
cards;
2012-11-01 .
2013-01-01 100
2013-02-01 100
2013-03-01 100
2013-04-01 100
2013-05-01 100
2013-07-01 100
2013-08-01 100
2013-09-01 100
2013-10-01 100
;;;;
proc expand data=time1 out=time2 method=none from=month to=month ;
id date ;
run;
proc expand data=time2 out=time3 method=none from=month to=month ;
id date;
convert return = roll3 / transformout=(nomiss movsum 3);
convert return = roll4 / transformout=(nomiss movsum 4);
run;
proc print;
run;
Obs date roll3 roll4 return
1 2012M11 . . .
2 2012M12 . . .
3 2013M01 . . 100
4 2013M02 . . 100
5 2013M03 300 . 100
6 2013M04 300 400 100
7 2013M05 300 400 100
8 2013M06 . . .
9 2013M07 . . 100
10 2013M08 . . 100
11 2013M09 300 . 100
12 2013M10 300 400 100
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I am using SAS 9.
Thank you for replying Tom. Your solution works well when the returns are for a single firm, however when I modify the data to include a firm_id and add data for second firm I receive errors.
The modified code with firm_id is as follows...
data time1 ;
input firm_id $6. date yymmdd10. return ;
format date yymm7.;
cards;
firmA 2012-11-01 .
firmA 2013-01-01 100
firmA 2013-02-01 100
firmA 2013-03-01 100
firmA 2013-04-01 100
firmA 2013-05-01 100
firmA 2013-07-01 100
firmA 2013-08-01 100
firmA 2013-09-01 100
firmA 2013-10-01 100
firmB 2012-11-01 .
firmB 2013-01-01 100
firmB 2013-02-01 100
firmB 2013-03-01 100
firmB 2013-04-01 100
firmB 2013-05-01 100
firmB 2013-07-01 100
firmB 2013-08-01 100
firmB 2013-09-01 100
firmB 2013-10-01 100
;;;;
proc expand data=time1 out=time2 method=none from=month to=month ;
id date;
run;
proc expand data=time2 out=time3 method=none from=month to=month ;
id date;
convert return = roll3 / transformout=(nomiss movsum 3);
convert return = roll4 / transformout=(nomiss movsum 4);
run;
...After the first proc expand the log reports several warnings about omitted observations and an error, as follows...
WARNING: 1 observations have been omitted before observation number 2 in data set WORK.TIME1
according to the FROM=MONTH specification and the ID variable values. The current ID is
date=2013M01, the previous is date=2012M11.
WARNING: 1 observations have been omitted before observation number 7 in data set WORK.TIME1
according to the FROM=MONTH specification and the ID variable values. The current ID is
date=2013M07, the previous is date=2013M05.
ERROR: The data set WORK.TIME1 is not sorted by the ID variable. At observation number 11, date=
2012M11, but date=2013M10 for the previous observation.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TIME2 may be incomplete. When this step was stopped there were 0
observations and 3 variables.
NOTE: PROCEDURE EXPAND used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
The data set Time2 then has 0 observations, as does Time3.
How can the code be modified to restrict the rolling sum to each firm?
Also, could you briefly explain the logic of using two proc expand statements rather than one?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You probably want to use a BY FIRM_ID statement so that PROC EXPAND will treat each firm separately.
Do you want the data for a firm expanded beyond the time interval that the firm's data covers? If so then you will need to use something else to "flesh out" the data records.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom, thank you. It works great!
I'm also not familiar with proc expand. Perhaps someone else can help explain what proc expand is doing to achieve the desired outcome?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Tom, I spoke too soon.
When testing the code on my actual data I noticed a problem that occurs whenever the firm_id changes and the first observation for that firm_id does not have a missing Return value. A value for rolling sum is returned for the first observation even though the first X observations for each firm should be missing since there is insufficient data (where X is the length of the rolling sum).
For example trying your code, modified with the by firm_id lines and the above-mentioned change in the data, results as follows...
data time1 ;
input firm_id $6. date yymmdd10. return ;
format date yymm7.;
cards;
firmA 2012-11-01 .
firmA 2013-01-01 100
firmA 2013-02-01 100
firmA 2013-03-01 100
firmA 2013-04-01 100
firmA 2013-05-01 100
firmA 2013-07-01 100
firmA 2013-08-01 100
firmA 2013-09-01 100
firmA 2013-10-01 100
firmB 2012-11-01 100
firmB 2013-01-01 100
firmB 2013-02-01 100
firmB 2013-03-01 100
firmB 2013-04-01 100
firmB 2013-05-01 100
firmB 2013-07-01 100
firmB 2013-08-01 100
firmB 2013-09-01 100
firmB 2013-10-01 100
;;;;
proc expand data=time1 out=time2 method=none from=month to=month ;
by firm_id;
id date;
run;
proc expand data=time2 out=time3 method=none from=month to=month ;
by firm_id;
id date;
convert return = roll13 / transformout=(nomiss movsum 3);
run;
proc print data=time3;
run;
The SAS System | 08:19 Thursday, February 19, 2015 1 |
Obs | firm_id | date | roll3 | return |
1 | firmA | 2012M11 | . | . | |
2 | firmA | 2012M12 | . | . | |
3 | firmA | 2013M01 | . | 100 | |
4 | firmA | 2013M02 | . | 100 | |
5 | firmA | 2013M03 | 300 | 100 | |
6 | firmA | 2013M04 | 300 | 100 | |
7 | firmA | 2013M05 | 300 | 100 | |
8 | firmA | 2013M06 | . | . | |
9 | firmA | 2013M07 | . | 100 | |
10 | firmA | 2013M08 | . | 100 | |
11 | firmA | 2013M09 | 300 | 100 | |
12 | firmA | 2013M10 | 300 | 100 | |
13 | firmB | 2012M11 | 100 | 100 | |
14 | firmB | 2012M12 | . | . | |
15 | firmB | 2013M01 | . | 100 | |
16 | firmB | 2013M02 | . | 100 | |
17 | firmB | 2013M03 | 300 | 100 | |
18 | firmB | 2013M04 | 300 | 100 | |
19 | firmB | 2013M05 | 300 | 100 | |
20 | firmB | 2013M06 | . | . | |
21 | firmB | 2013M07 | . | 100 | |
22 | firmB | 2013M08 | . | 100 | |
23 | firmB | 2013M09 | 300 | 100 | |
24 | firmB | 2013M10 | 300 | 100 |
The first observation for firmB has a value of "100" for roll3 when it should be "." since not enough months are available to form the 3 month sum.
How do I get around this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure. What I did in my example was add the empty record that was two months before the first non-missing value.
Or you could just use the method that was posted below using temporary arrays to generate the rolling sum.
proc expand data=time1 out=time2 method=none from=month to=month ;
by firm_id;
id date;
run;
data time3 ;
set time2 ;
by firm_id ;
array x3 (3) _temporary_;
array x12 (12) _temporary_;
if first.firm_id then call missing(of x3(*) x12(*));
if first.firm_id then month=0;
month+1;
x3(mod(month,3)+1)=return;
x12(mod(month,12)+1)=return;
if nmiss(of x3(*))=0 then roll3 = sum(of x3(*));
if nmiss(of x12(*))=0 then roll12 = sum(of x12(*));
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I used two calls because when I tried to do it in one it generated non-missing values for RETURN for the missing dates. I am not that familiar with PROC EXPAND so there may be a way to do it in one step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC EXPAND surely is more than sufficient for the job, but if all you need is the rolling sum or you don't have SAS/ETS, consider a simple data step solution:
data time1;
input firm_id $6. date yymmdd10. return;
format date yymm7.;
cards;
firmA 2012-11-01 .
firmA 2013-01-01 100
firmA 2013-02-01 100
firmA 2013-03-01 100
firmA 2013-04-01 100
firmA 2013-05-01 100
firmA 2013-07-01 100
firmA 2013-08-01 100
firmA 2013-09-01 100
firmA 2013-10-01 100
firmB 2012-11-01 .
firmB 2013-01-01 100
firmB 2013-02-01 100
firmB 2013-03-01 100
firmB 2013-04-01 100
firmB 2013-05-01 100
firmB 2013-07-01 100
firmB 2013-08-01 100
firmB 2013-09-01 100
firmB 2013-10-01 100
;;;;
%let mon=3;
data want;
array sum_rol(0:%eval(&mon.-1)) _temporary_;
do _n_=1 by 1 until (last.firm_id);
set time1;
by firm_id;
sum_rol(mod(_n_,&mon.))= return;
if n(of sum_rol(*))=&mon. then
rolling_sum=sum(of sum_rol(*));
else rolling_sum=.;
output;
end;
call missing (of sum_rol(*));
run;
Message was edited by: haikuo bian
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Hai.kuo, Thanks for the code.
It does reset the rolling sum when the data goes from one firm to the next and is outputting missing values when the Return is missing within any particular sum (both as desired), but it isn't accounting for the missing observations/months (e.g. 2012-12-01 and 2013-06-01). So around those dates the rolling sum is summing consecutive observations in the data, rather than consecutive months.
Can your code be modified to take that into account?
I like the convenience of just changing the value in the "%let mon=3" line to change the length of the rolling sum, which is why I am asking.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is the updated version and I hope it meets what you need. For the existing code, it is easier just to add another array.
data time1;
input firm_id $6. date yymmdd10. return;
format date yymm7.;
cards;
firmA 2012-11-01 .
firmA 2013-01-01 100
firmA 2013-02-01 100
firmA 2013-03-01 100
firmA 2013-08-01 100
firmA 2013-09-01 100
firmA 2013-10-01 100
firmB 2012-11-01 .
firmB 2013-01-01 100
firmB 2013-02-01 100
firmB 2013-03-01 100
firmB 2013-04-01 100
firmB 2013-05-01 100
firmB 2013-07-01 100
firmB 2013-08-01 100
firmB 2013-09-01 100
firmB 2013-10-01 100
;;;;
%let mon=3;
data want;
array sum_rol(0:%eval(&mon.-1)) _temporary_;
array mth (0:%eval(&mon.-1)) _temporary_;
do _n_=1 by 1 until (last.firm_id);
set time1;
by firm_id;
sum_rol(mod(_n_,&mon.))= return;
mth (mod(_n_,&mon.))= date;
if n(of sum_rol(*))=&mon. and intck('month', min(of mth(*)), max(of mth(*))) = 2 then
rolling_sum=sum(of sum_rol(*));
else rolling_sum=.;
output;
end;
call missing (of sum_rol(*));
call missing (of mth(*));
run;