Quartz | Level 8

## Cumulative Amount of consecutive days

Hello,

I want to find the sum of a values for X consecutive days by ID.

Consecutive days means days that happen one after the other without breaks. The program should be able to do so for different X consecutive days.

For instance, see the data below:

`data have;input ID \$ DT:date9. Amount Consecutive_Amount;format DT date9.;datalines;A 09JUL2021 3600 3600A 03AUG2021 456 489A 04AUG2021 33 33A 06AUG2021 235 335A 07AUG2021 100 100A 09AUG2021 86 86A 12AUG2021 456 456A 24AUG2021 22 1925A 25AUG2021 987 1984A 26AUG2021 916 997A 27AUG2021 81 81B 07AUG2021 554 992B 08AUG2021 193 438B 09AUG2021 245 245;run;`

Assume that we want to sum the amount for next 3 consecutive days.

On 24/08/2021 the amount for the next 3 consecutive days is 1925.
On 07/08/2021, we cannot calculate the amount for the next 3 consecutive days because there is a gap between 07/08/2021 and 09/08/2021.

Any assistance would be much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Cumulative Amount of consecutive days

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

So you want maybe some other fixed limit than 3.  I already sent code to do accumulations over any number of consecutive days.  But if you still need a fixed upper limit on size-of-consecutive-sequence, then:

``````data have;
input ID \$ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);

/* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
total+amount;
group_size+1;
/*Then reread the same observations, output, and decrement TOTAL */
if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
set have;
output;
total=total-amount;
group_size=group_size-1;
end;
run;``````

All you have to do is change the number 3 in the

``  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);``

statement to whatever fixed upper size limit you want.

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification).  The first pass builds the group total and looks for gaps or fixed group size limits.  The second re-reads the data, outputs it, and decrements the total.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
8 REPLIES 8
Diamond | Level 26

## Re: Cumulative Amount of consecutive days

Is it always 3 consecutive that is of interest? If so, the solution is shown below. If it could be 3 when you run the program today, and 12 when you run the program tomorrow, that's much more difficult.

``````data want;
merge have have(firstobs=2 rename=(id=id1 dt=dt1 amount=amount1))
have(firstobs=3 rename=(id=id2 dt=dt2 amount=amount2));
if id^=id1 or dt1-dt^=1 then consecutive_amount=amount;
if id=id1 and dt1-dt=1 then do;
consecutive_amount=sum(amount,amount1);
if id1=id2 and dt2-dt1=1 then consecutive_amount=sum(consecutive_amount,amount2);
end;
run;``````
--
Paige Miller
Obsidian | Level 7

## Re: Cumulative Amount of consecutive days

data have;

input ID \$ DT:date9. Amount Consecutive_Amount;

format DT date9.;

datalines;

A 09JUL2021 3600 3600

A 03AUG2021 456 489

A 04AUG2021 33 33

A 06AUG2021 235 335

A 07AUG2021 100 100

A 09AUG2021 86 86

A 12AUG2021 456 456

A 24AUG2021 22 1925

A 25AUG2021 987 1984

A 26AUG2021 916 997

A 27AUG2021 81 81

B 07AUG2021 554 992

B 08AUG2021 193 438

B 09AUG2021 245 245

;

run;

/*using lag function to check consecutive days*/

data have1;

set have;

by id;

format lagdt1 lagdt2 date9.;

array x(*) lagdt1-lagdt2;

lagdt1=lag1(dt);

lagdt2=lag2(dt);

if first.id then count=1;

do i=count to dim(x);

x(i)=.;

end;

count+1;

if n(dt, lagdt1)=2 then diff1=dt-lagdt1;

if n(lagdt1, lagdt2)=2 then diff2=lagdt1-lagdt2;

run;

data consec1;

set have1;

if diff1=1 and diff2=1;

run;

proc sort data=consec1; by id dt; run;

data dtrange; /*date ranges of 3 consecutive dates*/

set consec1;

by id dt;

if first.id;

mindt=lagdt2;

maxdt=dt;

keep id mindt maxdt;

run;

data consec3;

merge have1 dtrange;

by id;

if mindt<=dt<=maxdt;

keep id dt amount;

run;

proc sql;

create table want as

select id, dt, amount, sum(amount) as consec3_sum

from consec3

group by id;

quit;

/*Comments: This works for 3 consecutive days. I don't think there is an easy way to do for x number of consecutive days since the lag for each would be different*/

Super User

## Re: Cumulative Amount of consecutive days

A DOW-based solution which only needs a slight tweak has been posted here: https://communities.sas.com/t5/SAS-Programming/Cumulative-Amount-based-on-sequential-dates/td-p/7918...

## Re: Cumulative Amount of consecutive days

Why, in your ninth eighth observation, do you have consecutive_amount=1925?  Shouldn't it be 2006 if you really want the total for all consecutive dates?  (the 4-date sum of 22, 987, 916, 81 for 24AUG2021 through 27AUG2021).  Assuming that is an error, then this will work (I use variable TOTAL to replicate your variable CONSECUTIVE_AMOUNT):

``````data have;
input ID \$ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=nxt_:);

/* Read and increment TOTAL until a date gap or next id is upcoming */
merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
total+amount;

/*Then reread the same observations, output, and decrement TOTAL */
if id^=nxt_id or dt+1 ^= nxt_dt then do until (total=0);
set have;
output;
total=total-amount;
end;
run;
``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Quartz | Level 8

## Re: Cumulative Amount of consecutive days

This is really close!

In the 8th observation, where DT = 24AUG2021, the consecutive_amount should be 1925 because I only want the total for 3 consecutive days.

Another way to phrase this problem would be to say: I want the total amount for the period of the next 3 days if and only if there exist 3 consecutive days.

Also, it would be great to be able to change the number of consecutive days.

For example, if I want the total for 4 consecutive days, then the consecutive amount for the 8th observation, where DT = 24AUG2021, will be 2006.

## Re: Cumulative Amount of consecutive days

If you're adding only up to 3 consecutive amounts, subject to gap or id change boundaries, then:

``````data want (drop=nxt_:);
merge have
have (firstobs=2 keep=id dt amount rename=(id=nxt1_id dt=nxt1_dt amount=nxt1_amt))
have (firstobs=3 keep=id dt amount rename=(id=nxt2_id dt=nxt2_dt amount=nxt2_amt));

if nxt1_id^=id or nxt1_dt^=dt+1 then total=amount;          else
if nxt2_id^=id or nxt2_dt^=dt+2 then total=amount+nxt1_amt; else
total=sum(amount,nxt1_amt,nxt2_amt);
run;
``````
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Quartz | Level 8

## Re: Cumulative Amount of consecutive days

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

## Re: Cumulative Amount of consecutive days

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

So you want maybe some other fixed limit than 3.  I already sent code to do accumulations over any number of consecutive days.  But if you still need a fixed upper limit on size-of-consecutive-sequence, then:

``````data have;
input ID \$ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);

/* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
total+amount;
group_size+1;
/*Then reread the same observations, output, and decrement TOTAL */
if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
set have;
output;
total=total-amount;
group_size=group_size-1;
end;
run;``````

All you have to do is change the number 3 in the

``  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);``

statement to whatever fixed upper size limit you want.

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification).  The first pass builds the group total and looks for gaps or fixed group size limits.  The second re-reads the data, outputs it, and decrements the total.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Discussion stats
• 8 replies
• 421 views
• 0 likes
• 5 in conversation