Super User

## Re: Rolling Sum

Take three months for example.

```data have ;
input id spend  date :date9.  ;
format date date9.;
CARDS;
1     20      01JAN2010
1     10      01FEB2010
1      0      01Mar2010
1     10      01Apr2010
1     40      01May2010
2     20      01Mar2009
2     10      01Apr2009
2     10      01May2009
2     10      01jun2009
2      5      01jul2009
2     15      01Aug2009
;;;;
run;
proc sql;
create table want as
select *,case when (select sum(spend) from have where id=a.id and date between intnx('month',a.date,-2,'s') and a.date) eq 50 then 'Y' else 'N' end as flag
from have as a;
quit;

```

Xia Keshan

Calcite | Level 5

## Re: Rolling Sum

It does work, thanks for that but the self join is killing the server when i run it for 12 and 24 months on the complete table containing 700000 rows.

Do you think we can somehow use the Lag logic I was working with?

Thanks

Gaurav

SAS Employee

## Re: Rolling Sum

There are data step based solutions in it.

SAS Employee

## Re: Rolling Sum

Hi Gaurav,

I had some time to work on this problem. Still calculating the DPD_XX_730 variable is an excercise for you (you will need to create one more view)

I hope AccountingPeriod is a SAS date variable. If not, you need to change the intnx() function.

proc sort data=have;

by customerid accountingperiod;

run;

data have_exp6 / view=have_exp6;

set have;

AccountingPeriod=intnx("month",AccountingPeriod,6,'s');

run;

data have_exp12 / view=have_exp12;

set have;

AccountingPeriod=intnx("month",AccountingPeriod,12,'s');

run;

data want;

set have_exp6(in=in6) have_exp12(in=in12) have(in=inT);/*interleaving original and expiry datasets Gaurav: maybe here you need to adust the order of the datasets! Maybe have comes first. Experiment!*/

by Firm CustomerID AccountingPeriod;

if first.Firm then do;/*initializing counters and sums at beginning of a group*/

DPD_30_180=0; DPD_60_180=0; DPD_90_180=0;

DPD_30_360=0; DPD_60_360=0; DPD_90_360=0;

end;

if inT then do;/*if transaction, then first calculate, then output*/

DPD_30_180+DPD30_flag; DPD_60_180+DPD60_flag; DPD_90_180+DPD90_flag;

DPD_30_360+DPD30_flag; DPD_60_360+DPD60_flag; DPD_90_360+DPD90_flag;

output;

end;

if in6 do;/*if 6 month expiry, then calculate*/

DPD_30_180+(-DPD30_flag); DPD_60_180+(-DPD60_flag); DPD_90_180+(-DPD90_flag);

end;

if in12 do;/*if 12 month expiry, then calculate*/

DPD_30_360+(-DPD30_flag); DPD_60_360+(-DPD60_flag); DPD_90_360+(-DPD90_flag);

end;

run;

Message was edited by: Gergely Bathó

Calcite | Level 5

## Re: Rolling Sum

Hey GB, Well In the input table I already have calculated 2 variables accountingperiod_6 (containing the month going back 6 months in time from current value of accountingperiod) and accountingperiod_12 (same logic but 12 months in time).

So I have everything in 1 input table now. I dont have to create seperate tables as you have shown.

I am working now with your code to see if it solves my problem.

Thanks

Gaurav

SAS Employee

## Re: Rolling Sum

You are right! You don't need the additional views then.

Just change the set statement:

set have_exp6(in=in6 rename=(Accountingperiod_6=AccountingPeriod) drop=AccountingPeriod) have_exp12(rename=(Accountingperiod_12=AccountingPeriod) drop=AccountingPeriod in=in12) have(in=inT);

Good luck!

Calcite | Level 5

## Re: Rolling Sum

Thanks GB I got your code to work to suit my requirements.

I had a question although, why are you using the following expression

DPD-30-180+(-DPD30_flag), why are you using the negative '-' sign?

Gaurav

SAS Employee

## Re: Rolling Sum

DPD_30-180+(-DPD30_flag);   is equvivalent to

DPD_30_180=DPD_30_180-DPD30_flag;   ( plus:  DPD_30_180 becomes retained )

I just wanted to make that line very similar to the previous sum statment (and I also copy-pasted it from there):

DPD_30_180+180_DPD30_flag;

But this one would be syntactically wrong:

DPD_30-180-DPD30_flag;    - it is not a sum statement.

My code interleaves the original dataset and 2 shifted datasets.

When I read a record from the original dataset, I need to add to the cummulative variables.

When I read from the shifted dataset, I need to substract. It is because after 6 (or 12) months the effect of the DPD_XX variable "expires".

Super User

## Re: Rolling Sum

700000 rows is not a big deal .  I guess it will take you about six hours to get it . and faster version code ? I think you can use Array or Hash Table.

Xia Keshan

Calcite | Level 5

## Re: Rolling Sum

Ksharp, unfortunately time is th critcal factor here since I have to create 6,12 and 24 month flags and that too for 3 different variables (which comes to 9 self joins) and its been running now for the last 4 hours and i dont know when it is going to end.

Gergely, the example you gave isnt accumulating the data for the last 6 or 12 months as i wanted. It is just copying the data from one variable into the sum variable.

I have been playing around with an array and have got it to work as well, but again something is missing since I am not able to reinitialise the sum value once the BY group value changes. Here is the code

%let period = 6;

Data want_6;

set have;

by customerid;

array summed[&period] _temporary;

If X = &period then X=1;

else X+1;

summed = DPD30_flag;

if accountingperiod > accountingperiod_6 then do;

if _n_ >= &period then do;

DPD30_flag_sum = sum(of summed

• );
•      end;

end;

run;

Accountingperiod is the column holding the yyyymm value for the current observation and accountingperiod_6 is the varaible holding the yyyymm value for the 6th previous month.

Can you guys see what i am missing?

Gaurav

Super User

## Re: Rolling Sum

There are a couple of questions, Is there a missing month like 01FEB2012  01JUN2012  , Is there some other day in the same month like 02FEB2012 04FEB2012 ?

Still take three months for example :

```

data have ;
input id spend  date :date9.  ;
format date date9.;
CARDS;
1     20      01JAN2010
1     10      01FEB2010
1      0      01Mar2010
1     10      01Apr2010
1     40      01May2010
2     20      01Mar2009
2     10      01Apr2009
2     10      01May2009
2     10      01jun2009
2      5      01jul2009
2     15      01Aug2009
;;;;
run;
data want(drop=_date sum);
if _n_ eq 1 then do;
if 0 then set have(rename=(date=_date));
declare hash ha(dataset:'have(rename=(date=_date))',hashexp:20);
ha.definekey('id','_date');
ha.definedata('spend');
ha.definedone();
end;
set have;
sum=0;
do _date=intnx('month',date,-2,'s') to date;
if ha.find()=0 then sum+spend;
end;
flag=ifc( sum eq 50 , 'Y' , 'N');
run;
```

Xia Keshan

Discussion stats
• 25 replies
• 8899 views
• 1 like
• 11 in conversation