DATA Step, Macro, Functions and more

Rolling Sum

Reply
Super Contributor
Posts: 647

Rolling Sum


Rolling Sum:

How to calculating rolling 3 month sum and then set a flag to 'Y' if the sum euqls to 50. Acutaully I want to do arolling 12 month sum, but as an example I'm taking a 3 month.

in the second dataset,id 1 gets a flag 'Y' on 01May2010. The date of join for id 1 is 01jan2010. The rolling sum from that date which equals to 50 happens on 01May2010. Similarly for id 2 as well.

id dateofjoin   
1  01Jan2010
2  01Mar2009


id   spend     date            Flag
1     20      01JAN2010
1     10      01FEB2010
1      0      01Mar2010
1     10      01Apr2010
1     40      01May2010         Y     
2     20      01Mar2009
2     10      01Apr2009  
2     10      01May2009
2     10      01jun2009
2      5      01jul2009
2     15      01Aug2009         Y

Occasional Contributor
Posts: 18

Re: Rolling Sum

Hi, I am not clear with the understanding. Can you please elaborate a little more by letting know of your input and output data. Perhaps, others have understood better and hopefully you might get a sooner response, but if you could pen down a little more, would be of great help.

q1. rolling sum for continuous dates for the enitire 3 or 12 months?/*3 or 12 months doesn't make any difference if it's the same year i suppose*/

q2.Id 1 is all 2009 and Id 2 is all 2010?

Comment: is it that simple to filter using by groups, retain& sum statements and check for last.id and sum>50?

The more you can be elaborate the better for NOT so smart people like me to try.

Thanks,

Naveen

Super User
Posts: 10,500

Re: Rolling Sum

Do you want to restart the rolling sum for each ID group? (Implied but not explicitly stated in your question)

And do you want the flag set for exactly 50 or 50 and greater?

You state that you're using 3 months for the example but for id 2 setting the flag at 01Aug2009 requires at least 5 months to get 50. Which if that is the case, why not at 01Jun2009 where the 01Mar to 01Jun spend totals to 50?

How does dateofjoin come into this at all?

Super User
Posts: 5,084

Re: Rolling Sum

As long as we're asking questions ...

What if the first two months add up to 50?  Would that count, or do we have to add 3 months?   (Would your answer change if it were the first two months for an ID, vs. two months from the middle of the data?)

Can we rely on the order to the data?  For example, for a single ID is it possible that there is a missing month in the middle of the data?  It's a lot easier to program if we can interpret "3 consecutive months" as meaning "3 consecutive observations" rather than having to actually check the dates.

Despite all the questions, the programming is likely to be straightforward.  It's understanding the problem that is the more difficult part.  With the right problem definition, here would be part of the solution:

data want;

   set have;

   by id;

   if first.id then obsno=1;

   else obsno + 1;

   back2 = lag2(spend);

   back1 = lag1(spend);

   * more would go here, depending on the problem definition;

run;

Good luck.

Super Contributor
Posts: 647

Re: Rolling Sum

12 continous rolling month sum is requried. and in the above case to keep things simple,I resorted to 3 months. say for instance, if the sum of jan,feb and mar is less than 50 then we dont set the flag.we will continue checking feb,mar,apr.if the sum if 50 or greater at that point we will set a flag.

Super User
Posts: 5,084

Re: Rolling Sum

OK, I'll keep the answer short by sticking to exactly 3 months.  Also important here ... I'm assuming that the data is complete.  There is exactly one observation for each relevant ID/month.

data want;

  set have;

  by id;

  if first.id then obsno=1;

  else obsno + 1;

  back2 = lag2(amount);

  back1 = lag1(amount);

  if obsno >= 3 then do;

     if sum(amount, back1, back2) = 50 then flag='Y';

  end;

  drop obsno back1 back2;

run;

If you have any questions about expanding from 3 to 12 months, just ask.

Super Contributor
Posts: 647

Re: Rolling Sum

from 3 to 12, I will have to create 12  back variables?

Super User
Posts: 5,084

Re: Rolling Sum

Yes, actually 11 is enough since AMOUNT is part of the rolling average. 

Then add to the list of variables in the SUM statement and the DROP statement.

And change the comparison to OBSNO >= 12.

Valued Guide
Posts: 2,175

Re: Rolling Sum


* Since the problem solution might need to be over 3 or 12 months

  I will use a macro variable to hold that period.

* There is also a macro variable to hold the threshold ;

* to support missing months and transactional data (more than one

“spend” in a month I’ll use PROC SUMMARY to aggregate and a data

  step to insert the missing monthly rows

;

data transactions ;

input id spend  date :date9.  ;

LIST;CARDS;

1     20 01JAN2010

1     10 01FEB2010

1     10 01Apr2010

1     40 01May2010          

2     20 01Mar2009

2     10 01Apr2009  

2     10 01May2009

2     -5 01May2009

2      5 01Apr2009

2     35 01Aug2009      

;;;;

proc summary nway data= transactions( keep= id spend date ) ;

    class   id date ;

    format  date monyy7. ;

    var     spend ;

    output  sum= out= summary1 ;

run ;

data monthly_data  ;* add rows for months with no spend ;

    set summary1 ;

    by id ;

    priordate = lag( date ) ;

    if first.id then priordate= date ;

    output ;

    spend = . ;

    target = date ;

    date = priordate ;

    do while( intck( 'month', date, target) > 1) ;

         date= intnx( 'month', date, 1) ;

         output ;

    end ;

    keep id spend date ;

run ;

%let period= 3;

%let threshold = 30 ;

proc sort ;

  by id date ;

run ;

data flagged ;

    set monthly_data ;

    by id ;

    array period(0:&period ) _temporary_ ;

    if first.id then call missing( of period(*) ) ;

    this_month = mod(_n_,&period ) ;

    period( this_month) = spend ;

    if sum( of period(*) ) >= &threshold then flag='Y' ;

    else flag=' ' ;

run ;

proc print ;

run ;

hope that helps

Super Contributor
Posts: 647

Re: Rolling Sum

what does this_month variable do?

Valued Guide
Posts: 2,175

Re: Rolling Sum

The period array will hold each amount of SPEND for the months of the rolling period.

The statement

this_month = mod(_n_,&period ) ;

provides a pointer to the next slot in the array.

Although &period could support any rolling period you need, for longer periods the sum( of period(*)) becomes less effocient and you might maintain an accumulation of SPEND, deducting the value in the array element you are about to replace before inserting the current value of SPEND.

Occasional Contributor
Posts: 5

Re: Rolling Sum

Hi Astounding,

I am using a similar approach (createing and using Lag variables) to create a sum for a rolling 6,12 and 24 months in a single data step. So I have created about 24 Lag variables.

In my data the continuity of the periods is also not guranteed so i have also created a logic to handle that. But I am facing problems in resetting the Lag variables for each BY group.

The data has 1 row per customerid per period. It is not guranteed that for each customer we will have all the periods. So the task now is to sum the DPD_30_flag variable for a rolling 6 ,12 and 24 months. using the following code I am able to do the sum, but Can't reset it for every by group (which is what is needed).

Also i am trying to use a do loop so that i dont have to write the same lines of code 24 times, but that not working either.

Here is the Code

%macro lag_status(var=,day180=,day365=,day730=);

Data &var.(keep=customerid accountingperiod In_default dpd30_flag dpd60_flag dpd90_flag DPD_30_180 DPD_30_365 DPD_30_730

             DPD_60_180 DPD_60_365 DPD_60_730 DPD_90_180 DPD_90_365 DPD_90_730);

     Set DPD_flags;

     By Customerid; 

          /*Creating 24 Lag variables for AccountingPeriod*/

     Period1=lag1(accountingperiod);

     Period2=lag2(accountingperiod);

     Period3=lag3(accountingperiod);

     Period4=lag4(accountingperiod);

     Period5=lag5(accountingperiod);

     Period6=lag6(accountingperiod);

     Period7=lag7(accountingperiod);

     Period8=lag8(accountingperiod);

     Period9=lag9(accountingperiod);

     Period10=lag10(accountingperiod);

     Period11=lag11(accountingperiod);

     Period12=lag12(accountingperiod);

     Period13=lag13(accountingperiod);

     Period14=lag14(accountingperiod);

     Period15=lag15(accountingperiod);

     Period16=lag16(accountingperiod);

     Period17=lag17(accountingperiod);

     Period18=lag18(accountingperiod);

     Period19=lag19(accountingperiod);

     Period20=lag20(accountingperiod);

     Period21=lag21(accountingperiod);

     Period22=lag22(accountingperiod);

     Period23=lag23(accountingperiod);

     Period24=lag24(accountingperiod);

    

  /*Lag Variables created for 180,365 and 730 day flag*/

  DPD_back24 = lag24(&var.);

  DPD_back23 = lag23(&var.);

  DPD_back22 = lag22(&var.);

  DPD_back21 = lag21(&var.);

  DPD_back20 = lag20(&var.);

  DPD_back19 = lag19(&var.);

  DPD_back18 = lag18(&var.);

  DPD_back17 = lag17(&var.);

  DPD_back16 = lag16(&var.);

  DPD_back15 = lag15(&var.);

  DPD_back14 = lag14(&var.);

  DPD_back13 = lag13(&var.);

  DPD_back12 = lag12(&var.);

  DPD_back11 = lag11(&var.);

  DPD_back10 = lag10(&var.);

  DPD_back9 = lag9(&var.);

  DPD_back8 = lag8(&var.);

  DPD_back7 = lag7(&var.);

  DPD_back6 = lag6(&var.);

  DPD_back5 = lag5(&var.);

  DPD_back4 = lag4(&var.);

  DPD_back3 = lag3(&var.);

  DPD_back2 = lag2(&var.);

  DPD_back1 = lag1(&var.);

  end;

     /*Creating the Flag count Variables after checking that they fall in the last continous 6,12 or 24 months*/

Need help here

%do i = 1 %to 24;

     if Period&i. = . then DPD_&day180._flag&i. = 0;

          else if Period&i. >= AccountingPeriod_6 then DPD_&day180._flag&i. = DPD_back&i. ;

          else if DPD_&day180._flag&i. = 0;

     %end;

         

/*Summary variables for 180,365 and 730 day flag*/

 

  DPD_&day180.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6);

  DPD_&day365.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6,

                      DPD_&day180._flag7,DPD_&day180._flag8,DPD_&day180._flag9,DPD_&day180._flag10,DPD_&day180._flag11,DPD_&day180._flag12);

  DPD_&day730.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6,

                        DPD_&day180._flag7,DPD_&day180._flag8,DPD_&day180._flag9,DPD_&day180._flag10,DPD_&day180._flag11,DPD_&day180._flag12,

                        DPD_&day180._flag13,DPD_&day180._flag14,DPD_&day180._flag15,DPD_&day180._flag16,DPD_&day180._flag17,DPD_&day180._flag18,

                        DPD_&day180._flag19,DPD_&day180._flag20,DPD_&day180._flag21,DPD_&day180._flag22,DPD_&day180._flag23,DPD_&day180._flag24);

run;

%mend lag_status;

/*Calling the Macro 3 times once each for 30 DPD, 60 DPD and 90 DPD*/

%lag_status(var=DPD30_flag,day180=30_180,day365=30_365,day730=30_730);

And here is the sample data

CustomerIDAccountingPeriodIn_defaultDPD30_flagDPD60_flagDPD90_flagAccountingperiod_6Accountingperiod_12Accountingperiod_24
2180840200912No000200906200812200712
2180840201001No000200907200901200801
2180840201002No000200908200902200802
2180840201003No000200909200903200803
2180840201004No000200910200904200804
2180840201005No000200911200905200805
2180840201006No000200912200906200806
2180840201007No000201001200907200807
2180840201008No000201002200908200808
2180840201009No000201003200909200809
2180840201010No000201004200910200810
2180840201011No000201005200911200811
2180840201012No000201006200912200812
2180840201101No000201007201001200901
2180840201102No000201008201002200902
2180840201103No000201009201003200903
2180840201104No000201010201004200904
2180840201105No000201011201005200905
2180840201106No000201012201006200906
2180840201107No000201101201007200907
2180840201108No000201102201008200908
2180840201109No000201103201009200909
2180840201110No000201104201010200910
2180840201111No000201105201011200911
2180840201112No000201106201012200912
2180840201201Yes100201107201101201001
2180840201202Yes000201108201102201002
2180840201203No000201109201103201003
2180840201204No000201110201104201004
2180840201205No000201111201105201005
2180840201206Yes000201112201106201006
2180840201207No000201201201107201007
2180840201208No000201202201108201008
2180840201209No000201203201109201009
2180840201210No000201204201110201010
2180840201211No100201205201111201011
2180840201212No110201206201112201012
2180840201301Yes111201207201201201101
2180840201302Yes111201208201202201102
2180840201303Yes110201209201203201103
2180840201304Yes111201210201204201104
2180840201305Yes110201211201205201105
2180840201306Yes111201212201206201106
2180840201307Yes111201301201207201107
2180840201308Yes111201302201208201108
2180840201309Yes111201303201209201109
2180840201310Yes000201304201210201110
2180840201311Yes000201305201211201111
2180840201312Yes000201306201212201112
2180840201401No000201307201301201201
2180840201402No000201308201302201202
2180840201403No000201309201303201203
2180840201404Yes000201310201304201204
2180840201405Yes111201311201305201205
2180840201406Yes111201312201306201206
2180840201407Yes000201401201307201207
2180840201408Yes000201402201308201208
2259914200801No000200707200701200601
2259914200802No000200708200702200602
2259914200803No000200709200703200603
2259914200804No000200710200704200604
2259914200805No000200711200705200605
2259914200806No000200712200706200606
2259914200807No000200801200707200607
2259914200808No000200802200708200608
2259914200809No000200803200709200609
2259914200810No000200804200710200610
2259914200811No000200805200711200611
2259914200812No000200806200712200612
2259914200901No000200807200801200701
2259914200902No100200808200802200702
2259914200903No000200809200803200703
2259914200904No100200810200804200704
2259914200905No000200811200805200705

Can you help me?

Thanks

Gaurav

Super Contributor
Posts: 578

Re: Rolling Sum

I don't understand why you have a Y for ID 2.  Here's my best guess for what you're trying to do:

proc sql;

create table want as

select   

    t1.id,

    t1.date,

    t1.spend,

    sum(t2.spend) as Rolling_Spend,

    case when sum(t2.spend) >=50 then 'Y' end as Flag

from   

    have t1

    left  join have t2

        on t1.id=t2.id

            and intck('month',t2.date,t1.date) between 0 and 2

group by t1.id, t1.date, t1.spend

order by t1.id, t1.date;

quit;

iddatespendRolling_SpendFlag
101Jan20102020
101Feb20101030
101Mar2010030
101Apr20101020
101May20104050Y
201Mar20092020
201Apr20091030
201May20091040
201Jun20091030
201Jul2009525
201Aug20091530
Super User
Posts: 17,840

Re: Rolling Sum

Do you have SAS/ETS, if so use proc expand to calculate your moving average and then flag if the average is over 50

SAS/ETS(R) 9.2 User's Guide

Frequent Contributor
Frequent Contributor
Posts: 83

Re: Rolling Sum

Please refer to

25027 - Compute the moving average of a variable

There are codes with examples

Ask a Question
Discussion stats
  • 25 replies
  • 4150 views
  • 0 likes
  • 11 in conversation