Rolling Sum:
How to calculating rolling 3 month sum and then set a flag to 'Y' if the sum euqls to 50. Acutaully I want to do arolling 12 month sum, but as an example I'm taking a 3 month.
in the second dataset,id 1 gets a flag 'Y' on 01May2010. The date of join for id 1 is 01jan2010. The rolling sum from that date which equals to 50 happens on 01May2010. Similarly for id 2 as well.
id dateofjoin
1 01Jan2010
2 01Mar2009
id spend date Flag
1 20 01JAN2010
1 10 01FEB2010
1 0 01Mar2010
1 10 01Apr2010
1 40 01May2010 Y
2 20 01Mar2009
2 10 01Apr2009
2 10 01May2009
2 10 01jun2009
2 5 01jul2009
2 15 01Aug2009 Y
Hi, I am not clear with the understanding. Can you please elaborate a little more by letting know of your input and output data. Perhaps, others have understood better and hopefully you might get a sooner response, but if you could pen down a little more, would be of great help.
q1. rolling sum for continuous dates for the enitire 3 or 12 months?/*3 or 12 months doesn't make any difference if it's the same year i suppose*/
q2.Id 1 is all 2009 and Id 2 is all 2010?
Comment: is it that simple to filter using by groups, retain& sum statements and check for last.id and sum>50?
The more you can be elaborate the better for NOT so smart people like me to try.
Thanks,
Naveen
Do you want to restart the rolling sum for each ID group? (Implied but not explicitly stated in your question)
And do you want the flag set for exactly 50 or 50 and greater?
You state that you're using 3 months for the example but for id 2 setting the flag at 01Aug2009 requires at least 5 months to get 50. Which if that is the case, why not at 01Jun2009 where the 01Mar to 01Jun spend totals to 50?
How does dateofjoin come into this at all?
As long as we're asking questions ...
What if the first two months add up to 50? Would that count, or do we have to add 3 months? (Would your answer change if it were the first two months for an ID, vs. two months from the middle of the data?)
Can we rely on the order to the data? For example, for a single ID is it possible that there is a missing month in the middle of the data? It's a lot easier to program if we can interpret "3 consecutive months" as meaning "3 consecutive observations" rather than having to actually check the dates.
Despite all the questions, the programming is likely to be straightforward. It's understanding the problem that is the more difficult part. With the right problem definition, here would be part of the solution:
data want;
set have;
by id;
if first.id then obsno=1;
else obsno + 1;
back2 = lag2(spend);
back1 = lag1(spend);
* more would go here, depending on the problem definition;
run;
Good luck.
12 continous rolling month sum is requried. and in the above case to keep things simple,I resorted to 3 months. say for instance, if the sum of jan,feb and mar is less than 50 then we dont set the flag.we will continue checking feb,mar,apr.if the sum if 50 or greater at that point we will set a flag.
OK, I'll keep the answer short by sticking to exactly 3 months. Also important here ... I'm assuming that the data is complete. There is exactly one observation for each relevant ID/month.
data want;
set have;
by id;
if first.id then obsno=1;
else obsno + 1;
back2 = lag2(amount);
back1 = lag1(amount);
if obsno >= 3 then do;
if sum(amount, back1, back2) = 50 then flag='Y';
end;
drop obsno back1 back2;
run;
If you have any questions about expanding from 3 to 12 months, just ask.
from 3 to 12, I will have to create 12 back variables?
Yes, actually 11 is enough since AMOUNT is part of the rolling average.
Then add to the list of variables in the SUM statement and the DROP statement.
And change the comparison to OBSNO >= 12.
* Since the problem solution might need to be over 3 or 12 months
I will use a macro variable to hold that period.
* There is also a macro variable to hold the threshold ;
* to support missing months and transactional data (more than one
“spend” in a month I’ll use PROC SUMMARY to aggregate and a data
step to insert the missing monthly rows
;
data transactions ;
input id spend date :date9. ;
LIST;CARDS;
1 20 01JAN2010
1 10 01FEB2010
1 10 01Apr2010
1 40 01May2010
2 20 01Mar2009
2 10 01Apr2009
2 10 01May2009
2 -5 01May2009
2 5 01Apr2009
2 35 01Aug2009
;;;;
proc summary nway data= transactions( keep= id spend date ) ;
class id date ;
format date monyy7. ;
var spend ;
output sum= out= summary1 ;
run ;
data monthly_data ;* add rows for months with no spend ;
set summary1 ;
by id ;
priordate = lag( date ) ;
if first.id then priordate= date ;
output ;
spend = . ;
target = date ;
date = priordate ;
do while( intck( 'month', date, target) > 1) ;
date= intnx( 'month', date, 1) ;
output ;
end ;
keep id spend date ;
run ;
%let period= 3;
%let threshold = 30 ;
proc sort ;
by id date ;
run ;
data flagged ;
set monthly_data ;
by id ;
array period(0:&period ) _temporary_ ;
if first.id then call missing( of period(*) ) ;
this_month = mod(_n_,&period ) ;
period( this_month) = spend ;
if sum( of period(*) ) >= &threshold then flag='Y' ;
else flag=' ' ;
run ;
proc print ;
run ;
hope that helps
what does this_month variable do?
The period array will hold each amount of SPEND for the months of the rolling period.
The statement
this_month = mod(_n_,&period ) ;
provides a pointer to the next slot in the array.
Although &period could support any rolling period you need, for longer periods the sum( of period(*)) becomes less effocient and you might maintain an accumulation of SPEND, deducting the value in the array element you are about to replace before inserting the current value of SPEND.
Hi Astounding,
I am using a similar approach (createing and using Lag variables) to create a sum for a rolling 6,12 and 24 months in a single data step. So I have created about 24 Lag variables.
In my data the continuity of the periods is also not guranteed so i have also created a logic to handle that. But I am facing problems in resetting the Lag variables for each BY group.
The data has 1 row per customerid per period. It is not guranteed that for each customer we will have all the periods. So the task now is to sum the DPD_30_flag variable for a rolling 6 ,12 and 24 months. using the following code I am able to do the sum, but Can't reset it for every by group (which is what is needed).
Also i am trying to use a do loop so that i dont have to write the same lines of code 24 times, but that not working either.
Here is the Code
%macro lag_status(var=,day180=,day365=,day730=);
Data &var.(keep=customerid accountingperiod In_default dpd30_flag dpd60_flag dpd90_flag DPD_30_180 DPD_30_365 DPD_30_730
DPD_60_180 DPD_60_365 DPD_60_730 DPD_90_180 DPD_90_365 DPD_90_730);
Set DPD_flags;
By Customerid;
/*Creating 24 Lag variables for AccountingPeriod*/
Period1=lag1(accountingperiod);
Period2=lag2(accountingperiod);
Period3=lag3(accountingperiod);
Period4=lag4(accountingperiod);
Period5=lag5(accountingperiod);
Period6=lag6(accountingperiod);
Period7=lag7(accountingperiod);
Period8=lag8(accountingperiod);
Period9=lag9(accountingperiod);
Period10=lag10(accountingperiod);
Period11=lag11(accountingperiod);
Period12=lag12(accountingperiod);
Period13=lag13(accountingperiod);
Period14=lag14(accountingperiod);
Period15=lag15(accountingperiod);
Period16=lag16(accountingperiod);
Period17=lag17(accountingperiod);
Period18=lag18(accountingperiod);
Period19=lag19(accountingperiod);
Period20=lag20(accountingperiod);
Period21=lag21(accountingperiod);
Period22=lag22(accountingperiod);
Period23=lag23(accountingperiod);
Period24=lag24(accountingperiod);
/*Lag Variables created for 180,365 and 730 day flag*/
DPD_back24 = lag24(&var.);
DPD_back23 = lag23(&var.);
DPD_back22 = lag22(&var.);
DPD_back21 = lag21(&var.);
DPD_back20 = lag20(&var.);
DPD_back19 = lag19(&var.);
DPD_back18 = lag18(&var.);
DPD_back17 = lag17(&var.);
DPD_back16 = lag16(&var.);
DPD_back15 = lag15(&var.);
DPD_back14 = lag14(&var.);
DPD_back13 = lag13(&var.);
DPD_back12 = lag12(&var.);
DPD_back11 = lag11(&var.);
DPD_back10 = lag10(&var.);
DPD_back9 = lag9(&var.);
DPD_back8 = lag8(&var.);
DPD_back7 = lag7(&var.);
DPD_back6 = lag6(&var.);
DPD_back5 = lag5(&var.);
DPD_back4 = lag4(&var.);
DPD_back3 = lag3(&var.);
DPD_back2 = lag2(&var.);
DPD_back1 = lag1(&var.);
end;
/*Creating the Flag count Variables after checking that they fall in the last continous 6,12 or 24 months*/
Need help here
%do i = 1 %to 24;
if Period&i. = . then DPD_&day180._flag&i. = 0;
else if Period&i. >= AccountingPeriod_6 then DPD_&day180._flag&i. = DPD_back&i. ;
else if DPD_&day180._flag&i. = 0;
%end;
/*Summary variables for 180,365 and 730 day flag*/
DPD_&day180.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6);
DPD_&day365.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6,
DPD_&day180._flag7,DPD_&day180._flag8,DPD_&day180._flag9,DPD_&day180._flag10,DPD_&day180._flag11,DPD_&day180._flag12);
DPD_&day730.=sum(DPD_&day180._flag1,DPD_&day180._flag2,DPD_&day180._flag3,DPD_&day180._flag4,DPD_&day180._flag5,DPD_&day180._flag6,
DPD_&day180._flag7,DPD_&day180._flag8,DPD_&day180._flag9,DPD_&day180._flag10,DPD_&day180._flag11,DPD_&day180._flag12,
DPD_&day180._flag13,DPD_&day180._flag14,DPD_&day180._flag15,DPD_&day180._flag16,DPD_&day180._flag17,DPD_&day180._flag18,
DPD_&day180._flag19,DPD_&day180._flag20,DPD_&day180._flag21,DPD_&day180._flag22,DPD_&day180._flag23,DPD_&day180._flag24);
run;
%mend lag_status;
/*Calling the Macro 3 times once each for 30 DPD, 60 DPD and 90 DPD*/
%lag_status(var=DPD30_flag,day180=30_180,day365=30_365,day730=30_730);
And here is the sample data
CustomerID | AccountingPeriod | In_default | DPD30_flag | DPD60_flag | DPD90_flag | Accountingperiod_6 | Accountingperiod_12 | Accountingperiod_24 |
2180840 | 200912 | No | 0 | 0 | 0 | 200906 | 200812 | 200712 |
2180840 | 201001 | No | 0 | 0 | 0 | 200907 | 200901 | 200801 |
2180840 | 201002 | No | 0 | 0 | 0 | 200908 | 200902 | 200802 |
2180840 | 201003 | No | 0 | 0 | 0 | 200909 | 200903 | 200803 |
2180840 | 201004 | No | 0 | 0 | 0 | 200910 | 200904 | 200804 |
2180840 | 201005 | No | 0 | 0 | 0 | 200911 | 200905 | 200805 |
2180840 | 201006 | No | 0 | 0 | 0 | 200912 | 200906 | 200806 |
2180840 | 201007 | No | 0 | 0 | 0 | 201001 | 200907 | 200807 |
2180840 | 201008 | No | 0 | 0 | 0 | 201002 | 200908 | 200808 |
2180840 | 201009 | No | 0 | 0 | 0 | 201003 | 200909 | 200809 |
2180840 | 201010 | No | 0 | 0 | 0 | 201004 | 200910 | 200810 |
2180840 | 201011 | No | 0 | 0 | 0 | 201005 | 200911 | 200811 |
2180840 | 201012 | No | 0 | 0 | 0 | 201006 | 200912 | 200812 |
2180840 | 201101 | No | 0 | 0 | 0 | 201007 | 201001 | 200901 |
2180840 | 201102 | No | 0 | 0 | 0 | 201008 | 201002 | 200902 |
2180840 | 201103 | No | 0 | 0 | 0 | 201009 | 201003 | 200903 |
2180840 | 201104 | No | 0 | 0 | 0 | 201010 | 201004 | 200904 |
2180840 | 201105 | No | 0 | 0 | 0 | 201011 | 201005 | 200905 |
2180840 | 201106 | No | 0 | 0 | 0 | 201012 | 201006 | 200906 |
2180840 | 201107 | No | 0 | 0 | 0 | 201101 | 201007 | 200907 |
2180840 | 201108 | No | 0 | 0 | 0 | 201102 | 201008 | 200908 |
2180840 | 201109 | No | 0 | 0 | 0 | 201103 | 201009 | 200909 |
2180840 | 201110 | No | 0 | 0 | 0 | 201104 | 201010 | 200910 |
2180840 | 201111 | No | 0 | 0 | 0 | 201105 | 201011 | 200911 |
2180840 | 201112 | No | 0 | 0 | 0 | 201106 | 201012 | 200912 |
2180840 | 201201 | Yes | 1 | 0 | 0 | 201107 | 201101 | 201001 |
2180840 | 201202 | Yes | 0 | 0 | 0 | 201108 | 201102 | 201002 |
2180840 | 201203 | No | 0 | 0 | 0 | 201109 | 201103 | 201003 |
2180840 | 201204 | No | 0 | 0 | 0 | 201110 | 201104 | 201004 |
2180840 | 201205 | No | 0 | 0 | 0 | 201111 | 201105 | 201005 |
2180840 | 201206 | Yes | 0 | 0 | 0 | 201112 | 201106 | 201006 |
2180840 | 201207 | No | 0 | 0 | 0 | 201201 | 201107 | 201007 |
2180840 | 201208 | No | 0 | 0 | 0 | 201202 | 201108 | 201008 |
2180840 | 201209 | No | 0 | 0 | 0 | 201203 | 201109 | 201009 |
2180840 | 201210 | No | 0 | 0 | 0 | 201204 | 201110 | 201010 |
2180840 | 201211 | No | 1 | 0 | 0 | 201205 | 201111 | 201011 |
2180840 | 201212 | No | 1 | 1 | 0 | 201206 | 201112 | 201012 |
2180840 | 201301 | Yes | 1 | 1 | 1 | 201207 | 201201 | 201101 |
2180840 | 201302 | Yes | 1 | 1 | 1 | 201208 | 201202 | 201102 |
2180840 | 201303 | Yes | 1 | 1 | 0 | 201209 | 201203 | 201103 |
2180840 | 201304 | Yes | 1 | 1 | 1 | 201210 | 201204 | 201104 |
2180840 | 201305 | Yes | 1 | 1 | 0 | 201211 | 201205 | 201105 |
2180840 | 201306 | Yes | 1 | 1 | 1 | 201212 | 201206 | 201106 |
2180840 | 201307 | Yes | 1 | 1 | 1 | 201301 | 201207 | 201107 |
2180840 | 201308 | Yes | 1 | 1 | 1 | 201302 | 201208 | 201108 |
2180840 | 201309 | Yes | 1 | 1 | 1 | 201303 | 201209 | 201109 |
2180840 | 201310 | Yes | 0 | 0 | 0 | 201304 | 201210 | 201110 |
2180840 | 201311 | Yes | 0 | 0 | 0 | 201305 | 201211 | 201111 |
2180840 | 201312 | Yes | 0 | 0 | 0 | 201306 | 201212 | 201112 |
2180840 | 201401 | No | 0 | 0 | 0 | 201307 | 201301 | 201201 |
2180840 | 201402 | No | 0 | 0 | 0 | 201308 | 201302 | 201202 |
2180840 | 201403 | No | 0 | 0 | 0 | 201309 | 201303 | 201203 |
2180840 | 201404 | Yes | 0 | 0 | 0 | 201310 | 201304 | 201204 |
2180840 | 201405 | Yes | 1 | 1 | 1 | 201311 | 201305 | 201205 |
2180840 | 201406 | Yes | 1 | 1 | 1 | 201312 | 201306 | 201206 |
2180840 | 201407 | Yes | 0 | 0 | 0 | 201401 | 201307 | 201207 |
2180840 | 201408 | Yes | 0 | 0 | 0 | 201402 | 201308 | 201208 |
2259914 | 200801 | No | 0 | 0 | 0 | 200707 | 200701 | 200601 |
2259914 | 200802 | No | 0 | 0 | 0 | 200708 | 200702 | 200602 |
2259914 | 200803 | No | 0 | 0 | 0 | 200709 | 200703 | 200603 |
2259914 | 200804 | No | 0 | 0 | 0 | 200710 | 200704 | 200604 |
2259914 | 200805 | No | 0 | 0 | 0 | 200711 | 200705 | 200605 |
2259914 | 200806 | No | 0 | 0 | 0 | 200712 | 200706 | 200606 |
2259914 | 200807 | No | 0 | 0 | 0 | 200801 | 200707 | 200607 |
2259914 | 200808 | No | 0 | 0 | 0 | 200802 | 200708 | 200608 |
2259914 | 200809 | No | 0 | 0 | 0 | 200803 | 200709 | 200609 |
2259914 | 200810 | No | 0 | 0 | 0 | 200804 | 200710 | 200610 |
2259914 | 200811 | No | 0 | 0 | 0 | 200805 | 200711 | 200611 |
2259914 | 200812 | No | 0 | 0 | 0 | 200806 | 200712 | 200612 |
2259914 | 200901 | No | 0 | 0 | 0 | 200807 | 200801 | 200701 |
2259914 | 200902 | No | 1 | 0 | 0 | 200808 | 200802 | 200702 |
2259914 | 200903 | No | 0 | 0 | 0 | 200809 | 200803 | 200703 |
2259914 | 200904 | No | 1 | 0 | 0 | 200810 | 200804 | 200704 |
2259914 | 200905 | No | 0 | 0 | 0 | 200811 | 200805 | 200705 |
Can you help me?
Thanks
Gaurav
I don't understand why you have a Y for ID 2. Here's my best guess for what you're trying to do:
proc sql;
create table want as
select
t1.id,
t1.date,
t1.spend,
sum(t2.spend) as Rolling_Spend,
case when sum(t2.spend) >=50 then 'Y' end as Flag
from
have t1
left join have t2
on t1.id=t2.id
and intck('month',t2.date,t1.date) between 0 and 2
group by t1.id, t1.date, t1.spend
order by t1.id, t1.date;
quit;
id | date | spend | Rolling_Spend | Flag |
1 | 01Jan2010 | 20 | 20 | |
1 | 01Feb2010 | 10 | 30 | |
1 | 01Mar2010 | 0 | 30 | |
1 | 01Apr2010 | 10 | 20 | |
1 | 01May2010 | 40 | 50 | Y |
2 | 01Mar2009 | 20 | 20 | |
2 | 01Apr2009 | 10 | 30 | |
2 | 01May2009 | 10 | 40 | |
2 | 01Jun2009 | 10 | 30 | |
2 | 01Jul2009 | 5 | 25 | |
2 | 01Aug2009 | 15 | 30 |
Do you have SAS/ETS, if so use proc expand to calculate your moving average and then flag if the average is over 50
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.