Solved: Re: Cumulative Amount of consecutive days

Zatere · Posted 07-18-2023 04:10 PM

Hello,

I would ask for your help on the below.

I want to find the sum of a values for X consecutive days by ID.

Consecutive days means days that happen one after the other without breaks. The program should be able to do so for different X consecutive days.

For instance, see the data below:

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
;
run;

Assume that we want to sum the amount for next 3 consecutive days.

On 24/08/2021 the amount for the next 3 consecutive days is 1925.
On 07/08/2021, we cannot calculate the amount for the next 3 consecutive days because there is a gap between 07/08/2021 and 09/08/2021.

Any assistance would be much appreciated.

mkeintz · Posted 07-19-2023 11:51 AM

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

So you want maybe some other fixed limit than 3. I already sent code to do accumulations over any number of consecutive days. But if you still need a fixed upper limit on size-of-consecutive-sequence, then:

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);

  /* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
  merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
  total+amount;
  group_size+1;
  /*Then reread the same observations, output, and decrement TOTAL */
  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
    set have;
    output;
    total=total-amount;
    group_size=group_size-1;
  end;
run;

All you have to do is change the number 3 in the

  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);

statement to whatever fixed upper size limit you want.

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification). The first pass builds the group total and looks for gaps or fixed group size limits. The second re-reads the data, outputs it, and decrements the total.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

PaigeMiller · Posted 07-18-2023 05:36 PM

Is it always 3 consecutive that is of interest? If so, the solution is shown below. If it could be 3 when you run the program today, and 12 when you run the program tomorrow, that's much more difficult.

data want;
    merge have have(firstobs=2 rename=(id=id1 dt=dt1 amount=amount1)) 
    	have(firstobs=3 rename=(id=id2 dt=dt2 amount=amount2));
	if id^=id1 or dt1-dt^=1 then consecutive_amount=amount;
	if id=id1 and dt1-dt=1 then do;
		consecutive_amount=sum(amount,amount1);
		if id1=id2 and dt2-dt1=1 then consecutive_amount=sum(consecutive_amount,amount2);
	end;
run;

--
Paige Miller

Seadrago · Posted 07-18-2023 06:05 PM

data have;

input ID $ DT:date9. Amount Consecutive_Amount;

format DT date9.;

datalines;

A 09JUL2021 3600 3600

A 03AUG2021 456 489

A 04AUG2021 33 33

A 06AUG2021 235 335

A 07AUG2021 100 100

A 09AUG2021 86 86

A 12AUG2021 456 456

A 24AUG2021 22 1925

A 25AUG2021 987 1984

A 26AUG2021 916 997

A 27AUG2021 81 81

B 07AUG2021 554 992

B 08AUG2021 193 438

B 09AUG2021 245 245

;

run;

/*using lag function to check consecutive days*/

data have1;

set have;

by id;

format lagdt1 lagdt2 date9.;

array x(*) lagdt1-lagdt2;

lagdt1=lag1(dt);

lagdt2=lag2(dt);

if first.id then count=1;

do i=count to dim(x);

x(i)=.;

end;

count+1;

if n(dt, lagdt1)=2 then diff1=dt-lagdt1;

if n(lagdt1, lagdt2)=2 then diff2=lagdt1-lagdt2;

run;

data consec1;

set have1;

if diff1=1 and diff2=1;

run;

proc sort data=consec1; by id dt; run;

data dtrange; /*date ranges of 3 consecutive dates*/

set consec1;

by id dt;

if first.id;

mindt=lagdt2;

maxdt=dt;

keep id mindt maxdt;

run;

data consec3;

merge have1 dtrange;

by id;

if mindt<=dt<=maxdt;

keep id dt amount;

run;

proc sql;

create table want as

select id, dt, amount, sum(amount) as consec3_sum

from consec3

group by id;

quit;

/*Comments: This works for 3 consecutive days. I don't think there is an easy way to do for x number of consecutive days since the lag for each would be different*/

Kurt_Bremser · Posted 07-19-2023 02:53 AM

A DOW-based solution which only needs a slight tweak has been posted here: https://communities.sas.com/t5/SAS-Programming/Cumulative-Amount-based-on-sequential-dates/td-p/7918...

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 07-18-2023 07:44 PM

Why, in your ~~ninth~~ eighth observation, do you have consecutive_amount=1925? Shouldn't it be 2006 if you really want the total for all consecutive dates? (the 4-date sum of 22, 987, 916, 81 for 24AUG2021 through 27AUG2021). Assuming that is an error, then this will work (I use variable TOTAL to replicate your variable CONSECUTIVE_AMOUNT):

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=nxt_:);

  /* Read and increment TOTAL until a date gap or next id is upcoming */
  merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
  total+amount;

  /*Then reread the same observations, output, and decrement TOTAL */
  if id^=nxt_id or dt+1 ^= nxt_dt then do until (total=0);
    set have;
    output;
    total=total-amount;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Zatere · Posted 07-19-2023 02:51 AM

Hi, thanks for your reply.

This is really close!

In the 8th observation, where DT = 24AUG2021, the consecutive_amount should be 1925 because I only want the total for 3 consecutive days.

Another way to phrase this problem would be to say: I want the total amount for the period of the next 3 days if and only if there exist 3 consecutive days.

Also, it would be great to be able to change the number of consecutive days.

For example, if I want the total for 4 consecutive days, then the consecutive amount for the 8th observation, where DT = 24AUG2021, will be 2006.

mkeintz · Posted 07-19-2023 08:09 AM

If you're adding only up to 3 consecutive amounts, subject to gap or id change boundaries, then:

data want (drop=nxt_:);
  merge have
        have (firstobs=2 keep=id dt amount rename=(id=nxt1_id dt=nxt1_dt amount=nxt1_amt))
        have (firstobs=3 keep=id dt amount rename=(id=nxt2_id dt=nxt2_dt amount=nxt2_amt));

  if nxt1_id^=id or nxt1_dt^=dt+1 then total=amount;          else
  if nxt2_id^=id or nxt2_dt^=dt+2 then total=amount+nxt1_amt; else
  total=sum(amount,nxt1_amt,nxt2_amt);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Zatere · Posted 07-19-2023 09:37 AM

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

mkeintz · Posted 07-19-2023 11:51 AM

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

So you want maybe some other fixed limit than 3. I already sent code to do accumulations over any number of consecutive days. But if you still need a fixed upper limit on size-of-consecutive-sequence, then:

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);

  /* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
  merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
  total+amount;
  group_size+1;
  /*Then reread the same observations, output, and decrement TOTAL */
  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
    set have;
    output;
    total=total-amount;
    group_size=group_size-1;
  end;
run;

All you have to do is change the number 3 in the

  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);

statement to whatever fixed upper size limit you want.

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification). The first pass builds the group total and looks for gaps or fixed group size limits. The second re-reads the data, outputs it, and decrements the total.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!