BookmarkSubscribeRSS Feed
AndrewMyers
Calcite | Level 5

I have monthly returns data, Return, a date variable, Price_Date, and a firm identification variable, Firm_ID.

There are several years of returns data for multiple firms.

Some months are missing entirely (no values for any variables including the date).

Alternatively for some of the months that do exist (i.e. the date variable exists) the return variable and/or the firm identification variable have missing values (i.e. ".").

I want to calculate a new variable that is the rolling X-month (e.g. 3 months, 12 months) sum of the returns.

The new variable, Rolling_Sum, should equal the current month's return plus the previous (X-1) months returns (for a total of X months of returns).

However I also want to take into account missing observations such that when any of the returns in any particular month's rolling sum are missing then the rolling sum for that month should return a missing value also.

How can I achieve this?

10 REPLIES 10
Tom
Super User Tom
Super User

Do you have SAS/ETS?

data time1 ;

  input date yymmdd10. return ;

  format date yymm7.;

cards;

2012-11-01 .

2013-01-01 100

2013-02-01 100

2013-03-01 100

2013-04-01 100

2013-05-01 100

2013-07-01 100

2013-08-01 100

2013-09-01 100

2013-10-01 100

;;;;

proc expand data=time1 out=time2 method=none from=month to=month ;

id date ;

run;

proc expand data=time2 out=time3 method=none from=month to=month ;

id date;

convert return = roll3 / transformout=(nomiss movsum 3);

convert return = roll4 / transformout=(nomiss movsum 4);

run;

proc print;

run;

Obs       date    roll3    roll4    return

  1    2012M11       .        .         .

  2    2012M12       .        .         .

  3    2013M01       .        .       100

  4    2013M02       .        .       100

  5    2013M03     300        .       100

  6    2013M04     300      400       100

  7    2013M05     300      400       100

  8    2013M06       .        .         .

  9    2013M07       .        .       100

10    2013M08       .        .       100

11    2013M09     300        .       100

12    2013M10     300      400       100

AndrewMyers
Calcite | Level 5

Yes, I am using SAS 9.

Thank you for replying Tom. Your solution works well when the returns are for a single firm, however when I modify the data to include a firm_id and add data for second firm I receive errors.

The modified code with firm_id is as follows...

data time1 ;

  input firm_id $6. date yymmdd10. return ;

  format date yymm7.;

cards;

firmA 2012-11-01 .

firmA 2013-01-01 100

firmA 2013-02-01 100

firmA 2013-03-01 100

firmA 2013-04-01 100

firmA 2013-05-01 100

firmA 2013-07-01 100

firmA 2013-08-01 100

firmA 2013-09-01 100

firmA 2013-10-01 100

firmB 2012-11-01 .

firmB 2013-01-01 100

firmB 2013-02-01 100

firmB 2013-03-01 100

firmB 2013-04-01 100

firmB 2013-05-01 100

firmB 2013-07-01 100

firmB 2013-08-01 100

firmB 2013-09-01 100

firmB 2013-10-01 100

;;;;

proc expand data=time1 out=time2 method=none from=month to=month ;

id date;

run;

proc expand data=time2 out=time3 method=none from=month to=month ;

id date;

convert return = roll3 / transformout=(nomiss movsum 3);

convert return = roll4 / transformout=(nomiss movsum 4);

run;

...After the first proc expand the log reports several warnings about omitted observations and an error, as follows...


WARNING: 1 observations have been omitted before observation number 2 in data set WORK.TIME1

         according to the FROM=MONTH specification and the ID variable values. The current ID is

         date=2013M01, the previous is date=2012M11.

WARNING: 1 observations have been omitted before observation number 7 in data set WORK.TIME1

         according to the FROM=MONTH specification and the ID variable values. The current ID is

         date=2013M07, the previous is date=2013M05.

ERROR: The data set WORK.TIME1 is not sorted by the ID variable. At observation number 11, date=

       2012M11, but date=2013M10 for the previous observation.

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.TIME2 may be incomplete.  When this step was stopped there were 0

         observations and 3 variables.

NOTE: PROCEDURE EXPAND used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

The data set Time2 then has 0 observations, as does Time3.


How can the code be modified to restrict the rolling sum to each firm?

Also, could you briefly explain the logic of using two proc expand statements rather than one?

Tom
Super User Tom
Super User

You probably want to use a BY FIRM_ID statement so that PROC EXPAND will treat each firm separately.

Do you want the data for a firm expanded beyond the time interval that the firm's data covers? If so then you will need to use something else to "flesh out" the data records.

AndrewMyers
Calcite | Level 5

@Tom, thank you. It works great!

I'm also not familiar with proc expand. Perhaps someone else can help explain what proc expand is doing to achieve the desired outcome?

AndrewMyers
Calcite | Level 5

Tom, I spoke too soon.

When testing the code on my actual data I noticed a problem that occurs whenever the firm_id changes and the first observation for that firm_id does not have a missing Return value. A value for rolling sum is returned for the first observation even though the first X observations for each firm should be missing since there is insufficient data (where X is the length of the rolling sum).

For example trying your code, modified with the by firm_id lines and the above-mentioned change in the data, results as follows...

data time1 ;

  input firm_id $6. date yymmdd10. return ;

  format date yymm7.;

cards;

firmA 2012-11-01 .

firmA 2013-01-01 100

firmA 2013-02-01 100

firmA 2013-03-01 100

firmA 2013-04-01 100

firmA 2013-05-01 100

firmA 2013-07-01 100

firmA 2013-08-01 100

firmA 2013-09-01 100

firmA 2013-10-01 100

firmB 2012-11-01 100

firmB 2013-01-01 100

firmB 2013-02-01 100

firmB 2013-03-01 100

firmB 2013-04-01 100

firmB 2013-05-01 100

firmB 2013-07-01 100

firmB 2013-08-01 100

firmB 2013-09-01 100

firmB 2013-10-01 100

;;;;

proc expand data=time1 out=time2 method=none from=month to=month ;

by firm_id;

id date;

run;

proc expand data=time2 out=time3 method=none from=month to=month ;

by firm_id;

id date;

convert return = roll13 / transformout=(nomiss movsum 3);

run;

proc print data=time3;

run;

The SAS System 08:19 Thursday, February 19, 2015   1
Obsfirm_id   dateroll3return
  1 firmA 2012M11    .     .
  2 firmA 2012M12    .     .
  3 firmA 2013M01    .   100
  4 firmA 2013M02    .   100
  5 firmA 2013M03  300   100
  6 firmA 2013M04  300   100
  7 firmA 2013M05  300   100
  8 firmA 2013M06    .     .
  9 firmA 2013M07    .   100
10 firmA 2013M08    .   100
11 firmA 2013M09  300   100
12 firmA 2013M10  300   100
13 firmB 2012M11  100   100
14 firmB 2012M12    .     .
15 firmB 2013M01    .   100
16 firmB 2013M02    .   100
17 firmB 2013M03  300   100
18 firmB 2013M04  300   100
19 firmB 2013M05  300   100
20 firmB 2013M06    .     .
21 firmB 2013M07    .   100
22 firmB 2013M08    .   100
23 firmB 2013M09  300   100
24 firmB 2013M10  300   100

The first observation for firmB has a value of "100" for roll3 when it should be "." since not enough months are available to form the 3 month sum.

How do I get around this?

Tom
Super User Tom
Super User

Not sure.  What I did in my example was add the empty record that was two months before the first non-missing value.

Or you could just use the method that was posted below using temporary arrays to generate the rolling sum.

proc expand data=time1 out=time2 method=none from=month to=month ;

by firm_id;

id date;

run;

data time3 ;

  set time2 ;

  by firm_id ;

  array x3 (3) _temporary_;

  array x12 (12) _temporary_;

  if first.firm_id then call missing(of x3(*) x12(*));

  if first.firm_id then month=0;

  month+1;

  x3(mod(month,3)+1)=return;

  x12(mod(month,12)+1)=return;

  if nmiss(of x3(*))=0 then roll3 = sum(of x3(*));

  if nmiss(of x12(*))=0 then roll12 = sum(of x12(*));

run;

Tom
Super User Tom
Super User

I used two calls because when I tried to do it in one it generated non-missing values for RETURN for the missing dates.  I am not that familiar with PROC EXPAND so there may be a way to do it in one step.

Haikuo
Onyx | Level 15

PROC EXPAND surely is more than sufficient for the job, but if all you need is the rolling sum or you don't have SAS/ETS, consider a simple data step solution:

data time1;

     input firm_id $6. date yymmdd10. return;

     format date yymm7.;

     cards;

firmA 2012-11-01 .

firmA 2013-01-01 100

firmA 2013-02-01 100

firmA 2013-03-01 100

firmA 2013-04-01 100

firmA 2013-05-01 100

firmA 2013-07-01 100

firmA 2013-08-01 100

firmA 2013-09-01 100

firmA 2013-10-01 100

firmB 2012-11-01 .

firmB 2013-01-01 100

firmB 2013-02-01 100

firmB 2013-03-01 100

firmB 2013-04-01 100

firmB 2013-05-01 100

firmB 2013-07-01 100

firmB 2013-08-01 100

firmB 2013-09-01 100

firmB 2013-10-01 100

;;;;

%let mon=3;

data want;

     array sum_rol(0:%eval(&mon.-1)) _temporary_;

     do _n_=1 by 1 until (last.firm_id);

           set time1;

           by firm_id;

           sum_rol(mod(_n_,&mon.))= return;

           if n(of sum_rol(*))=&mon. then

                rolling_sum=sum(of sum_rol(*));

           else rolling_sum=.;

           output;

     end;

call missing (of sum_rol(*));

run;

Message was edited by: haikuo bian

AndrewMyers
Calcite | Level 5

@

Can your code be modified to take that into account?

I like the convenience of just changing the value in the "%let mon=3" line to change the length of the rolling sum, which is why I am asking.

Haikuo
Onyx | Level 15

This is the updated version and I hope it meets what you need. For the existing code, it is easier just to add another array.

data time1;

input firm_id $6. date yymmdd10. return;

format date yymm7.;

cards;

firmA 2012-11-01 .

firmA 2013-01-01 100

firmA 2013-02-01 100

firmA 2013-03-01 100

firmA 2013-08-01 100

firmA 2013-09-01 100

firmA 2013-10-01 100

firmB 2012-11-01 .

firmB 2013-01-01 100

firmB 2013-02-01 100

firmB 2013-03-01 100

firmB 2013-04-01 100

firmB 2013-05-01 100

firmB 2013-07-01 100

firmB 2013-08-01 100

firmB 2013-09-01 100

firmB 2013-10-01 100

;;;;

%let mon=3;

data want;

array sum_rol(0:%eval(&mon.-1)) _temporary_;

      array mth (0:%eval(&mon.-1)) _temporary_;

do _n_=1 by 1 until (last.firm_id);

           set time1;

           by firm_id;

           sum_rol(mod(_n_,&mon.))= return;

              mth (mod(_n_,&mon.))= date;

          if n(of sum_rol(*))=&mon. and intck('month', min(of mth(*)), max(of mth(*))) = 2 then

                rolling_sum=sum(of sum_rol(*));

           else rolling_sum=.;

           output;

end;

call missing (of sum_rol(*));

call missing (of mth(*));

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3347 views
  • 6 likes
  • 3 in conversation