BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Zatere
Quartz | Level 8

Hello,

I would ask for your help on the below.

 

I want to find the sum of a values for X consecutive days by ID.

 

Consecutive days means days that happen one after the other without breaks. The program should be able to do so for different X consecutive days.

 

For instance, see the data below:

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
;
run;

Zatere_1-1689710845913.png

 

Assume that we want to sum the amount for next 3 consecutive days.

On 24/08/2021 the amount for the next 3 consecutive days is 1925.
On 07/08/2021, we cannot calculate the amount for the next 3 consecutive days because there is a gap between 07/08/2021 and 09/08/2021.

 

Any assistance would be much appreciated.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.


So you want maybe some other fixed limit than 3.  I already sent code to do accumulations over any number of consecutive days.  But if you still need a fixed upper limit on size-of-consecutive-sequence, then:

 

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);

  /* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
  merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
  total+amount;
  group_size+1;
  /*Then reread the same observations, output, and decrement TOTAL */
  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
    set have;
    output;
    total=total-amount;
    group_size=group_size-1;
  end;
run;

All you have to do is change the number 3 in the

  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);

statement to whatever fixed upper size limit you want.

 

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification).  The first pass builds the group total and looks for gaps or fixed group size limits.  The second re-reads the data, outputs it, and decrements the total.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Is it always 3 consecutive that is of interest? If so, the solution is shown below. If it could be 3 when you run the program today, and 12 when you run the program tomorrow, that's much more difficult.

 

data want;
    merge have have(firstobs=2 rename=(id=id1 dt=dt1 amount=amount1)) 
    	have(firstobs=3 rename=(id=id2 dt=dt2 amount=amount2));
	if id^=id1 or dt1-dt^=1 then consecutive_amount=amount;
	if id=id1 and dt1-dt=1 then do;
		consecutive_amount=sum(amount,amount1);
		if id1=id2 and dt2-dt1=1 then consecutive_amount=sum(consecutive_amount,amount2);
	end;
run;
--
Paige Miller
Seadrago
Obsidian | Level 7

data have;

input ID $ DT:date9. Amount Consecutive_Amount;

format DT date9.;

datalines;

A 09JUL2021 3600 3600

A 03AUG2021 456 489

A 04AUG2021 33 33

A 06AUG2021 235 335

A 07AUG2021 100 100

A 09AUG2021 86 86

A 12AUG2021 456 456

A 24AUG2021 22 1925

A 25AUG2021 987 1984

A 26AUG2021 916 997

A 27AUG2021 81 81

B 07AUG2021 554 992

B 08AUG2021 193 438

B 09AUG2021 245 245

;

run;

 

/*using lag function to check consecutive days*/

data have1;

  set have;

  by id;

  format lagdt1 lagdt2 date9.;

 

  array x(*) lagdt1-lagdt2;

  lagdt1=lag1(dt);

  lagdt2=lag2(dt);

 

  if first.id then count=1;

  do i=count to dim(x);

     x(i)=.;

  end;

  count+1;

 

  if n(dt, lagdt1)=2 then diff1=dt-lagdt1;

  if n(lagdt1, lagdt2)=2 then diff2=lagdt1-lagdt2;

 

run;

 

data consec1;

  set have1;

  if diff1=1 and diff2=1;

run;

proc sort data=consec1; by id dt; run;

data dtrange; /*date ranges of 3 consecutive dates*/

  set consec1;

  by id dt;

  if first.id;

  mindt=lagdt2;

  maxdt=dt;

  keep id mindt maxdt;

run;

 

data consec3;

  merge have1 dtrange;

  by id;

  if mindt<=dt<=maxdt;

  keep id dt amount;

run;

 

proc sql;

  create table want as

  select id, dt, amount, sum(amount) as consec3_sum

  from consec3

  group by id;

quit;

 

/*Comments: This works for 3 consecutive days. I don't think there is an easy way to do for x number of consecutive days since the lag for each would be different*/

mkeintz
PROC Star

Why, in your ninth eighth observation, do you have consecutive_amount=1925?  Shouldn't it be 2006 if you really want the total for all consecutive dates?  (the 4-date sum of 22, 987, 916, 81 for 24AUG2021 through 27AUG2021).  Assuming that is an error, then this will work (I use variable TOTAL to replicate your variable CONSECUTIVE_AMOUNT): 

 

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=nxt_:);

  /* Read and increment TOTAL until a date gap or next id is upcoming */
  merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
  total+amount;

  /*Then reread the same observations, output, and decrement TOTAL */
  if id^=nxt_id or dt+1 ^= nxt_dt then do until (total=0);
    set have;
    output;
    total=total-amount;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Zatere
Quartz | Level 8

Hi, thanks for your reply.

This is really close!

In the 8th observation, where DT = 24AUG2021, the consecutive_amount should be 1925 because I only want the total for 3 consecutive days.

Another way to phrase this problem would be to say: I want the total amount for the period of the next 3 days if and only if there exist 3 consecutive days.

Also, it would be great to be able to change the number of consecutive days. 

For example, if I want the total for 4 consecutive days, then the consecutive amount for the 8th observation, where DT = 24AUG2021, will be 2006.

mkeintz
PROC Star

If you're adding only up to 3 consecutive amounts, subject to gap or id change boundaries, then:

 

data want (drop=nxt_:);
  merge have
        have (firstobs=2 keep=id dt amount rename=(id=nxt1_id dt=nxt1_dt amount=nxt1_amt))
        have (firstobs=3 keep=id dt amount rename=(id=nxt2_id dt=nxt2_dt amount=nxt2_amt));

  if nxt1_id^=id or nxt1_dt^=dt+1 then total=amount;          else
  if nxt2_id^=id or nxt2_dt^=dt+2 then total=amount+nxt1_amt; else
  total=sum(amount,nxt1_amt,nxt2_amt);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Zatere
Quartz | Level 8

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.

mkeintz
PROC Star

@Zatere wrote:

I have done a few testing and I think that it works! Thanks a lot.

It is also quite efficient in large volume of data.

I will leave the post open for a day or so, because if you need to do the same piece of analysis for say 15 consecutive days then you will have to add the statements and calculations accordingly which is not very flexible.


So you want maybe some other fixed limit than 3.  I already sent code to do accumulations over any number of consecutive days.  But if you still need a fixed upper limit on size-of-consecutive-sequence, then:

 

data have;
input ID $ DT:date9. Amount Consecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 33
A 06AUG2021 235 335
A 07AUG2021 100 100
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 81
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
run;
data want (drop=group_size nxt_:);

  /* Read and increment TOTAL until a date gap or next id is upcoming or group_size limit */
  merge have   have (firstobs=2 keep=id dt rename=(id=nxt_id dt=nxt_dt));
  total+amount;
  group_size+1;
  /*Then reread the same observations, output, and decrement TOTAL */
  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);
    set have;
    output;
    total=total-amount;
    group_size=group_size-1;
  end;
run;

All you have to do is change the number 3 in the

  if id^=nxt_id or dt+1 ^= nxt_dt or group_size=3 then do until (group_size=0);

statement to whatever fixed upper size limit you want.

 

Note that this program passes through each group twice (where group "size" can be from 1 to 3 observations in your original specification).  The first pass builds the group total and looks for gaps or fixed group size limits.  The second re-reads the data, outputs it, and decrements the total.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 975 views
  • 0 likes
  • 5 in conversation