Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Re: Cumulative Amount of consecutive days

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-18-2023 04:10 PM
(699 views)

Hello,

I would ask for your help on the below.

I want to find the sum of a values for X consecutive days by ID.

Consecutive days means days that happen one after the other without breaks. The program should be able to do so for different X consecutive days.

For instance, see the data below:

data have;

input ID $ DT:date9. Amount Consecutive_Amount;

format DT date9.;

datalines;

A 09JUL2021 3600 3600

A 03AUG2021 456 489

A 04AUG2021 33 33

A 06AUG2021 235 335

A 07AUG2021 100 100

A 09AUG2021 86 86

A 12AUG2021 456 456

A 24AUG2021 22 1925

A 25AUG2021 987 1984

A 26AUG2021 916 997

A 27AUG2021 81 81

B 07AUG2021 554 992

B 08AUG2021 193 438

B 09AUG2021 245 245

;

run;

Assume that we want to sum the amount for next 3 consecutive days.

On 24/08/2021 the amount for the next 3 consecutive days is 1925.

On 07/08/2021, we cannot calculate the amount for the next 3 consecutive days because there is a gap between 07/08/2021 and 09/08/2021.

Any assistance would be much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

So you want maybe some other fixed limit than 3. I already sent code to do accumulations over any number of consecutive days. But if you still need a * fixed* upper limit on size-of-consecutive-sequence, then:

```
data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);
/* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
merge have have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
total+amount;
group_size+1;
/*Then reread the same observations, output, and decrement TOTAL */
if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
set have;
output;
total=total-amount;
group_size=group_size-1;
end;
run;
```

All you have to do is change the number 3 in the

` if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);`

statement to whatever fixed upper size limit you want.

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification). The first pass builds the group total and looks for gaps or fixed group size limits. The second re-reads the data, outputs it, and decrements the total.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Is it always 3 consecutive that is of interest? If so, the solution is shown below. If it could be 3 when you run the program today, and 12 when you run the program tomorrow, that's much more difficult.

```
data want;
merge have have(firstobs=2 rename=(id=id1 dt=dt1 amount=amount1))
have(firstobs=3 rename=(id=id2 dt=dt2 amount=amount2));
if id^=id1 or dt1-dt^=1 then consecutive_amount=amount;
if id=id1 and dt1-dt=1 then do;
consecutive_amount=sum(amount,amount1);
if id1=id2 and dt2-dt1=1 then consecutive_amount=sum(consecutive_amount,amount2);
end;
run;
```

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

data have;

input ID $ DT:date9. Amount Consecutive_Amount;

format DT date9.;

datalines;

A 09JUL2021 3600 3600

A 03AUG2021 456 489

A 04AUG2021 33 33

A 06AUG2021 235 335

A 07AUG2021 100 100

A 09AUG2021 86 86

A 12AUG2021 456 456

A 24AUG2021 22 1925

A 25AUG2021 987 1984

A 26AUG2021 916 997

A 27AUG2021 81 81

B 07AUG2021 554 992

B 08AUG2021 193 438

B 09AUG2021 245 245

;

run;

/*using lag function to check consecutive days*/

data have1;

set have;

by id;

format lagdt1 lagdt2 date9.;

array x(*) lagdt1-lagdt2;

lagdt1=lag1(dt);

lagdt2=lag2(dt);

if first.id then count=1;

do i=count to dim(x);

x(i)=.;

end;

count+1;

if n(dt, lagdt1)=2 then diff1=dt-lagdt1;

if n(lagdt1, lagdt2)=2 then diff2=lagdt1-lagdt2;

run;

data consec1;

set have1;

if diff1=1 and diff2=1;

run;

proc sort data=consec1; by id dt; run;

data dtrange; /*date ranges of 3 consecutive dates*/

set consec1;

by id dt;

if first.id;

mindt=lagdt2;

maxdt=dt;

keep id mindt maxdt;

run;

data consec3;

merge have1 dtrange;

by id;

if mindt<=dt<=maxdt;

keep id dt amount;

run;

proc sql;

create table want as

select id, dt, amount, sum(amount) as consec3_sum

from consec3

group by id;

quit;

/*Comments: This works for 3 consecutive days. I don't think there is an easy way to do for x number of consecutive days since the lag for each would be different*/

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Why, in your ~~ninth~~ eighth observation, do you have consecutive_amount=1925? Shouldn't it be 2006 if you really want the total for * all* consecutive dates? (the 4-date sum of 22, 987, 916, 81 for 24AUG2021 through 27AUG2021). Assuming that is an error, then this will work (I use variable TOTAL to replicate your variable CONSECUTIVE_AMOUNT):

```
data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=nxt_:);
/* Read and increment TOTAL until a date gap or next id is upcoming */
merge have have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
total+amount;
/*Then reread the same observations, output, and decrement TOTAL */
if id^=nxt_id or dt+1 ^= nxt_dt then do until (total=0);
set have;
output;
total=total-amount;
end;
run;
```

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi, thanks for your reply.

This is really close!

In the 8th observation, where DT = 24AUG2021, the consecutive_amount should be 1925 because I only want the total for 3 consecutive days.

Another way to phrase this problem would be to say: I want the total amount for the period of the next 3 days if and only if there exist 3 **consecutive** days.

Also, it would be great to be able to change the number of consecutive days.

For example, if I want the total for 4 consecutive days, then the consecutive amount for the 8th observation, where DT = 24AUG2021, will be 2006.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you're adding only up to 3 consecutive amounts, subject to gap or id change boundaries, then:

```
data want (drop=nxt_:);
merge have
have (firstobs=2 keep=id dt amount rename=(id=nxt1_id dt=nxt1_dt amount=nxt1_amt))
have (firstobs=3 keep=id dt amount rename=(id=nxt2_id dt=nxt2_dt amount=nxt2_amt));
if nxt1_id^=id or nxt1_dt^=dt+1 then total=amount; else
if nxt2_id^=id or nxt2_dt^=dt+2 then total=amount+nxt1_amt; else
total=sum(amount,nxt1_amt,nxt2_amt);
run;
```

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

So you want maybe some other fixed limit than 3. I already sent code to do accumulations over any number of consecutive days. But if you still need a * fixed* upper limit on size-of-consecutive-sequence, then:

```
data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);
/* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
merge have have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
total+amount;
group_size+1;
/*Then reread the same observations, output, and decrement TOTAL */
if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
set have;
output;
total=total-amount;
group_size=group_size-1;
end;
run;
```

All you have to do is change the number 3 in the

` if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);`

statement to whatever fixed upper size limit you want.

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification). The first pass builds the group total and looks for gaps or fixed group size limits. The second re-reads the data, outputs it, and decrements the total.

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.