BookmarkSubscribeRSS Feed
Quartz | Level 8

## Rolling Sum

Rolling Sum:

How to calculating rolling 3 month sum and then set a flag to 'Y' if the sum euqls to 50. Acutaully I want to do arolling 12 month sum, but as an example I'm taking a 3 month.

in the second dataset,id 1 gets a flag 'Y' on 01May2010. The date of join for id 1 is 01jan2010. The rolling sum from that date which equals to 50 happens on 01May2010. Similarly for id 2 as well.

id dateofjoin
1  01Jan2010
2  01Mar2009

id   spend     date            Flag
1     20      01JAN2010
1     10      01FEB2010
1      0      01Mar2010
1     10      01Apr2010
1     40      01May2010         Y
2     20      01Mar2009
2     10      01Apr2009
2     10      01May2009
2     10      01jun2009
2      5      01jul2009
2     15      01Aug2009         Y

25 REPLIES 25
Calcite | Level 5

## Re: Rolling Sum

Hi, I am not clear with the understanding. Can you please elaborate a little more by letting know of your input and output data. Perhaps, others have understood better and hopefully you might get a sooner response, but if you could pen down a little more, would be of great help.

q1. rolling sum for continuous dates for the enitire 3 or 12 months?/*3 or 12 months doesn't make any difference if it's the same year i suppose*/

q2.Id 1 is all 2009 and Id 2 is all 2010?

Comment: is it that simple to filter using by groups, retain& sum statements and check for last.id and sum>50?

The more you can be elaborate the better for NOT so smart people like me to try.

Thanks,

Naveen

Super User

## Re: Rolling Sum

Do you want to restart the rolling sum for each ID group? (Implied but not explicitly stated in your question)

And do you want the flag set for exactly 50 or 50 and greater?

You state that you're using 3 months for the example but for id 2 setting the flag at 01Aug2009 requires at least 5 months to get 50. Which if that is the case, why not at 01Jun2009 where the 01Mar to 01Jun spend totals to 50?

How does dateofjoin come into this at all?

PROC Star

## Re: Rolling Sum

As long as we're asking questions ...

What if the first two months add up to 50?  Would that count, or do we have to add 3 months?   (Would your answer change if it were the first two months for an ID, vs. two months from the middle of the data?)

Can we rely on the order to the data?  For example, for a single ID is it possible that there is a missing month in the middle of the data?  It's a lot easier to program if we can interpret "3 consecutive months" as meaning "3 consecutive observations" rather than having to actually check the dates.

Despite all the questions, the programming is likely to be straightforward.  It's understanding the problem that is the more difficult part.  With the right problem definition, here would be part of the solution:

data want;

set have;

by id;

if first.id then obsno=1;

else obsno + 1;

back2 = lag2(spend);

back1 = lag1(spend);

* more would go here, depending on the problem definition;

run;

Good luck.

Quartz | Level 8

## Re: Rolling Sum

12 continous rolling month sum is requried. and in the above case to keep things simple,I resorted to 3 months. say for instance, if the sum of jan,feb and mar is less than 50 then we dont set the flag.we will continue checking feb,mar,apr.if the sum if 50 or greater at that point we will set a flag.

PROC Star

## Re: Rolling Sum

OK, I'll keep the answer short by sticking to exactly 3 months.  Also important here ... I'm assuming that the data is complete.  There is exactly one observation for each relevant ID/month.

data want;

set have;

by id;

if first.id then obsno=1;

else obsno + 1;

back2 = lag2(amount);

back1 = lag1(amount);

if obsno >= 3 then do;

if sum(amount, back1, back2) = 50 then flag='Y';

end;

drop obsno back1 back2;

run;

If you have any questions about expanding from 3 to 12 months, just ask.

Quartz | Level 8

## Re: Rolling Sum

from 3 to 12, I will have to create 12  back variables?

PROC Star

## Re: Rolling Sum

Yes, actually 11 is enough since AMOUNT is part of the rolling average.

Then add to the list of variables in the SUM statement and the DROP statement.

And change the comparison to OBSNO >= 12.

Rhodochrosite | Level 12

## Re: Rolling Sum

* Since the problem solution might need to be over 3 or 12 months

I will use a macro variable to hold that period.

* There is also a macro variable to hold the threshold ;

* to support missing months and transactional data (more than one

“spend” in a month I’ll use PROC SUMMARY to aggregate and a data

step to insert the missing monthly rows

;

data transactions ;

input id spend  date :date9.  ;

LIST;CARDS;

1     20 01JAN2010

1     10 01FEB2010

1     10 01Apr2010

1     40 01May2010

2     20 01Mar2009

2     10 01Apr2009

2     10 01May2009

2     -5 01May2009

2      5 01Apr2009

2     35 01Aug2009

;;;;

proc summary nway data= transactions( keep= id spend date ) ;

class   id date ;

format  date monyy7. ;

var     spend ;

output  sum= out= summary1 ;

run ;

data monthly_data  ;* add rows for months with no spend ;

set summary1 ;

by id ;

priordate = lag( date ) ;

if first.id then priordate= date ;

output ;

spend = . ;

target = date ;

date = priordate ;

do while( intck( 'month', date, target) > 1) ;

date= intnx( 'month', date, 1) ;

output ;

end ;

keep id spend date ;

run ;

%let period= 3;

%let threshold = 30 ;

proc sort ;

by id date ;

run ;

data flagged ;

set monthly_data ;

by id ;

array period(0:&period ) _temporary_ ;

if first.id then call missing( of period(*) ) ;

this_month = mod(_n_,&period ) ;

period( this_month) = spend ;

if sum( of period(*) ) >= &threshold then flag='Y' ;

else flag=' ' ;

run ;

proc print ;

run ;

hope that helps

Quartz | Level 8

## Re: Rolling Sum

what does this_month variable do?

Rhodochrosite | Level 12

## Re: Rolling Sum

The period array will hold each amount of SPEND for the months of the rolling period.

The statement

this_month = mod(_n_,&period ) ;

provides a pointer to the next slot in the array.

Although &period could support any rolling period you need, for longer periods the sum( of period(*)) becomes less effocient and you might maintain an accumulation of SPEND, deducting the value in the array element you are about to replace before inserting the current value of SPEND.

Calcite | Level 5

## Re: Rolling Sum

Hi Astounding,

I am using a similar approach (createing and using Lag variables) to create a sum for a rolling 6,12 and 24 months in a single data step. So I have created about 24 Lag variables.

In my data the continuity of the periods is also not guranteed so i have also created a logic to handle that. But I am facing problems in resetting the Lag variables for each BY group.

The data has 1 row per customerid per period. It is not guranteed that for each customer we will have all the periods. So the task now is to sum the DPD_30_flag variable for a rolling 6 ,12 and 24 months. using the following code I am able to do the sum, but Can't reset it for every by group (which is what is needed).

Also i am trying to use a do loop so that i dont have to write the same lines of code 24 times, but that not working either.

Here is the Code

%macro lag_status(var=,day180=,day365=,day730=);

Data &var.(keep=customerid accountingperiod In_default dpd30_flag dpd60_flag dpd90_flag DPD_30_180 DPD_30_365 DPD_30_730

DPD_60_180 DPD_60_365 DPD_60_730 DPD_90_180 DPD_90_365 DPD_90_730);

Set DPD_flags;

By Customerid;

/*Creating 24 Lag variables for AccountingPeriod*/

Period1=lag1(accountingperiod);

Period2=lag2(accountingperiod);

Period3=lag3(accountingperiod);

Period4=lag4(accountingperiod);

Period5=lag5(accountingperiod);

Period6=lag6(accountingperiod);

Period7=lag7(accountingperiod);

Period8=lag8(accountingperiod);

Period9=lag9(accountingperiod);

Period10=lag10(accountingperiod);

Period11=lag11(accountingperiod);

Period12=lag12(accountingperiod);

Period13=lag13(accountingperiod);

Period14=lag14(accountingperiod);

Period15=lag15(accountingperiod);

Period16=lag16(accountingperiod);

Period17=lag17(accountingperiod);

Period18=lag18(accountingperiod);

Period19=lag19(accountingperiod);

Period20=lag20(accountingperiod);

Period21=lag21(accountingperiod);

Period22=lag22(accountingperiod);

Period23=lag23(accountingperiod);

Period24=lag24(accountingperiod);

/*Lag Variables created for 180,365 and 730 day flag*/

DPD_back24 = lag24(&var.);

DPD_back23 = lag23(&var.);

DPD_back22 = lag22(&var.);

DPD_back21 = lag21(&var.);

DPD_back20 = lag20(&var.);

DPD_back19 = lag19(&var.);

DPD_back18 = lag18(&var.);

DPD_back17 = lag17(&var.);

DPD_back16 = lag16(&var.);

DPD_back15 = lag15(&var.);

DPD_back14 = lag14(&var.);

DPD_back13 = lag13(&var.);

DPD_back12 = lag12(&var.);

DPD_back11 = lag11(&var.);

DPD_back10 = lag10(&var.);

DPD_back9 = lag9(&var.);

DPD_back8 = lag8(&var.);

DPD_back7 = lag7(&var.);

DPD_back6 = lag6(&var.);

DPD_back5 = lag5(&var.);

DPD_back4 = lag4(&var.);

DPD_back3 = lag3(&var.);

DPD_back2 = lag2(&var.);

DPD_back1 = lag1(&var.);

end;

/*Creating the Flag count Variables after checking that they fall in the last continous 6,12 or 24 months*/

Need help here

%do i = 1 %to 24;

if Period&i. = . then DPD_&day180._flag&i. = 0;

else if Period&i. >= AccountingPeriod_6 then DPD_&day180._flag&i. = DPD_back&i. ;

else if DPD_&day180._flag&i. = 0;

%end;

/*Summary variables for 180,365 and 730 day flag*/

DPD_&day180.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6);

DPD_&day365.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6,

DPD_&day180._flag7,DPD_&day180._flag8,DPD_&day180._flag9,DPD_&day180._flag10,DPD_&day180._flag11,DPD_&day180._flag12);

DPD_&day730.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6,

DPD_&day180._flag7,DPD_&day180._flag8,DPD_&day180._flag9,DPD_&day180._flag10,DPD_&day180._flag11,DPD_&day180._flag12,

DPD_&day180._flag13,DPD_&day180._flag14,DPD_&day180._flag15,DPD_&day180._flag16,DPD_&day180._flag17,DPD_&day180._flag18,

DPD_&day180._flag19,DPD_&day180._flag20,DPD_&day180._flag21,DPD_&day180._flag22,DPD_&day180._flag23,DPD_&day180._flag24);

run;

%mend lag_status;

/*Calling the Macro 3 times once each for 30 DPD, 60 DPD and 90 DPD*/

%lag_status(var=DPD30_flag,day180=30_180,day365=30_365,day730=30_730);

And here is the sample data

 CustomerID AccountingPeriod In_default DPD30_flag DPD60_flag DPD90_flag Accountingperiod_6 Accountingperiod_12 Accountingperiod_24 2180840 200912 No 0 0 0 200906 200812 200712 2180840 201001 No 0 0 0 200907 200901 200801 2180840 201002 No 0 0 0 200908 200902 200802 2180840 201003 No 0 0 0 200909 200903 200803 2180840 201004 No 0 0 0 200910 200904 200804 2180840 201005 No 0 0 0 200911 200905 200805 2180840 201006 No 0 0 0 200912 200906 200806 2180840 201007 No 0 0 0 201001 200907 200807 2180840 201008 No 0 0 0 201002 200908 200808 2180840 201009 No 0 0 0 201003 200909 200809 2180840 201010 No 0 0 0 201004 200910 200810 2180840 201011 No 0 0 0 201005 200911 200811 2180840 201012 No 0 0 0 201006 200912 200812 2180840 201101 No 0 0 0 201007 201001 200901 2180840 201102 No 0 0 0 201008 201002 200902 2180840 201103 No 0 0 0 201009 201003 200903 2180840 201104 No 0 0 0 201010 201004 200904 2180840 201105 No 0 0 0 201011 201005 200905 2180840 201106 No 0 0 0 201012 201006 200906 2180840 201107 No 0 0 0 201101 201007 200907 2180840 201108 No 0 0 0 201102 201008 200908 2180840 201109 No 0 0 0 201103 201009 200909 2180840 201110 No 0 0 0 201104 201010 200910 2180840 201111 No 0 0 0 201105 201011 200911 2180840 201112 No 0 0 0 201106 201012 200912 2180840 201201 Yes 1 0 0 201107 201101 201001 2180840 201202 Yes 0 0 0 201108 201102 201002 2180840 201203 No 0 0 0 201109 201103 201003 2180840 201204 No 0 0 0 201110 201104 201004 2180840 201205 No 0 0 0 201111 201105 201005 2180840 201206 Yes 0 0 0 201112 201106 201006 2180840 201207 No 0 0 0 201201 201107 201007 2180840 201208 No 0 0 0 201202 201108 201008 2180840 201209 No 0 0 0 201203 201109 201009 2180840 201210 No 0 0 0 201204 201110 201010 2180840 201211 No 1 0 0 201205 201111 201011 2180840 201212 No 1 1 0 201206 201112 201012 2180840 201301 Yes 1 1 1 201207 201201 201101 2180840 201302 Yes 1 1 1 201208 201202 201102 2180840 201303 Yes 1 1 0 201209 201203 201103 2180840 201304 Yes 1 1 1 201210 201204 201104 2180840 201305 Yes 1 1 0 201211 201205 201105 2180840 201306 Yes 1 1 1 201212 201206 201106 2180840 201307 Yes 1 1 1 201301 201207 201107 2180840 201308 Yes 1 1 1 201302 201208 201108 2180840 201309 Yes 1 1 1 201303 201209 201109 2180840 201310 Yes 0 0 0 201304 201210 201110 2180840 201311 Yes 0 0 0 201305 201211 201111 2180840 201312 Yes 0 0 0 201306 201212 201112 2180840 201401 No 0 0 0 201307 201301 201201 2180840 201402 No 0 0 0 201308 201302 201202 2180840 201403 No 0 0 0 201309 201303 201203 2180840 201404 Yes 0 0 0 201310 201304 201204 2180840 201405 Yes 1 1 1 201311 201305 201205 2180840 201406 Yes 1 1 1 201312 201306 201206 2180840 201407 Yes 0 0 0 201401 201307 201207 2180840 201408 Yes 0 0 0 201402 201308 201208 2259914 200801 No 0 0 0 200707 200701 200601 2259914 200802 No 0 0 0 200708 200702 200602 2259914 200803 No 0 0 0 200709 200703 200603 2259914 200804 No 0 0 0 200710 200704 200604 2259914 200805 No 0 0 0 200711 200705 200605 2259914 200806 No 0 0 0 200712 200706 200606 2259914 200807 No 0 0 0 200801 200707 200607 2259914 200808 No 0 0 0 200802 200708 200608 2259914 200809 No 0 0 0 200803 200709 200609 2259914 200810 No 0 0 0 200804 200710 200610 2259914 200811 No 0 0 0 200805 200711 200611 2259914 200812 No 0 0 0 200806 200712 200612 2259914 200901 No 0 0 0 200807 200801 200701 2259914 200902 No 1 0 0 200808 200802 200702 2259914 200903 No 0 0 0 200809 200803 200703 2259914 200904 No 1 0 0 200810 200804 200704 2259914 200905 No 0 0 0 200811 200805 200705

Can you help me?

Thanks

Gaurav

Lapis Lazuli | Level 10

## Re: Rolling Sum

I don't understand why you have a Y for ID 2.  Here's my best guess for what you're trying to do:

proc sql;

create table want as

select

t1.id,

t1.date,

t1.spend,

sum(t2.spend) as Rolling_Spend,

case when sum(t2.spend) >=50 then 'Y' end as Flag

from

have t1

left  join have t2

on t1.id=t2.id

and intck('month',t2.date,t1.date) between 0 and 2

group by t1.id, t1.date, t1.spend

order by t1.id, t1.date;

quit;

 id date spend Rolling_Spend Flag 1 01Jan2010 20 20 1 01Feb2010 10 30 1 01Mar2010 0 30 1 01Apr2010 10 20 1 01May2010 40 50 Y 2 01Mar2009 20 20 2 01Apr2009 10 30 2 01May2009 10 40 2 01Jun2009 10 30 2 01Jul2009 5 25 2 01Aug2009 15 30
Super User

## Re: Rolling Sum

Do you have SAS/ETS, if so use proc expand to calculate your moving average and then flag if the average is over 50

SAS/ETS(R) 9.2 User's Guide

Calcite | Level 5

## Re: Rolling Sum

Please refer to

25027 - Compute the moving average of a variable

There are codes with examples

Discussion stats
• 25 replies
• 8862 views
• 1 like
• 11 in conversation