Solved: Re: Cumulative Amount based on sequential dates

Zatere · Posted 01-24-2022 10:03 AM

Hello there,

I would like to calculate the cumulative amount based on sequential dates.

Please see below the data I have:

data have;
input ID $ DT:date9. Amount;
format DT date9.;
datalines;
A 09JUL2021 3600
A 03AUG2021 456
A 04AUG2021 33
A 06AUG2021 235
A 07AUG2021 100
A 09AUG2021 86
A 12AUG2021 456
A 24AUG2021 22
A 25AUG2021 987
A 26AUG2021 916
A 27AUG2021 81
;
run;

I want a variable that represents the cumulative amount of the sequential dates. E.g., 03AUG2021 & 04AUG2021

are sequential and therefore the cumulative amount is 489.

Please see below the data I want:

data want;
input ID $ DT:date9. Amount Variable_Wanted;
format DT date9.;
datalines;
A 09JUL2021 3600 0
A 03AUG2021 456 0
A 04AUG2021 33 489
A 06AUG2021 235 0
A 07AUG2021 100 335
A 09AUG2021 86 0
A 12AUG2021 456 0
A 24AUG2021 22 0
A 25AUG2021 987 1009
A 26AUG2021 916 1903
A 27AUG2021 81 997
;
run;

Ideally, the user should be able to select the number of sequential dates to include.

Please see below the closest I can make:

data wanted;
	set have;
	new_Amount = 0;
	by ID DT;
	if dif(DT) <= 1 then /*how many sequential*/
		do;
			new_Amount = Amount;
		end;
run;

Any help would much be appreciated.

Thanks.

japelin · Posted 01-24-2022 10:35 AM

how about this code.

proc sort data=have;
  by id dt;
run;

data want;
  set have;
  by id dt;
  retain sum amt dt0;
  if dt=sum(dt0,1) then sum=sum(amt,amount);
  else sum=0;
  amt=amount;
  dt0=dt;
  drop amt dt0;
run;

View solution in original post

japelin · Posted 01-24-2022 10:35 AM

how about this code.

proc sort data=have;
  by id dt;
run;

data want;
  set have;
  by id dt;
  retain sum amt dt0;
  if dt=sum(dt0,1) then sum=sum(amt,amount);
  else sum=0;
  amt=amount;
  dt0=dt;
  drop amt dt0;
run;

Zatere · Posted 01-25-2022 05:52 AM

Hi, I didnt consider everyting and I am wondering if I could ask some additinal help. As is, the program calculates the cumulative amount only for 1 sequential dates. For instance, 03AUG2021 & 04AUG2021 which is 489. I also wanted to be able to calculate the cumulative amount for more than 1 sequential dates. For example: - for two sequential dates: 04AUG2021 & 06AUG2021 = 268 - for three sequential dates: 25AUG2021 & 26AUG2021 & 27AUG2021 = 3909. I tried to change the dt=sum(dt0,2), but it doesnt work because it will not consider any dates between, meaning it will calculate the cumulative amount for only the pairs: 04AUG2021,06AUG2021 and 07AUG2021,09AUG2021. Any help please?

Kurt_Bremser · Posted 01-25-2022 06:46 AM

Your initial question (summing only two days) can be solved like this:

data want;
set have;
by id dt;
variable_wanted = ifn(
  first.id or lag(dt) ne dt - 1,
  0,
  amount + lag(amount)
);
run;

A running total over any arbitrary number of consecutive days goes like this:

data want;
set have;
by id dt;
if first.id or lag(dt) ne dt - 1
then do;
  variable_wanted = 0;
  output;
  variable_wanted = amount;
end;
else do;
  variable_wanted + amount;
  output;
end;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 01-25-2022 07:43 AM

Hi, thank you so much for your reply. It is not exaclty what I wanted. How can I get the sum of the next, lets say, 2 days of each day? This is what I was looking for (when I want to find the sum of 2 days in a row): data what_I_want; input ID $ DT:date9. Amount Sum_of_Next_2_Days; format DT date9.; datalines; A 09JUL2021 3600 0 A 03AUG2021 456 0 A 04AUG2021 33 489 A 06AUG2021 235 268 A 07AUG2021 100 335 A 09AUG2021 86 186 A 12AUG2021 456 0 A 24AUG2021 22 0 A 25AUG2021 987 0 A 26AUG2021 916 1925 A 27AUG2021 81 1984 ; run; It should work similarly if I want to find the sum of 3 days in a row and so on. Thanks.

Kurt_Bremser · Posted 01-26-2022 02:59 AM

Is this to be based on the same example data you have in your initial post?

And please don't forget to use code boxes for your code, otherwise it gets scrambled, as you can see 😉

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 01-26-2022 04:14 AM

Hello,

Below my data:

data what_I_want; 
input ID $ DT:date9. Amount Sum_of_Next_2_Days; 
format DT date9.; 
datalines; 
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 268
A 06AUG2021 235 335
A 07AUG2021 100 186
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 1551
A 27AUG2021 81 828
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
; 
run;

Along the line I would like something like that.

For each date, I want to find the sum of its next 2 days.					
Find for next 2 dates:					
For 09Jul, sum Amount where DT Between 09Jul and 11Jul					
For 03Aug, sum Amount where DT Between 03Aug and 05Aug					
For 04Aug, sum Amount where DT Between 04Aug and 06Aug					
For 06Aug, sum Amount where DT Between 06Aug and 08Aug					
For 07Aug, sum Amount where DT Between 07Aug and 09Aug					
For 09Aug, sum Amount where DT Between 09Aug and 11Aug					
For 12Aug, sum Amount where DT Between 12Aug and 14Aug					
For 24Aug, sum Amount where DT Between 24Aug and 26Aug					
For 25Aug, sum Amount where DT Between 25Aug and 27Aug					
For 26Aug, sum Amount where DT Between 26Aug and 28Aug					
For 27Aug, sum Amount where DT Between 27Aug and 29Aug					
For 07Aug, sum Amount where DT Between 07Aug and 09Aug					
For 08Aug, sum Amount where DT Between 08Aug and 10Aug					
For 09Aug, sum Amount where DT Between 09Aug and 11Aug

Kurt_Bremser · Posted 01-26-2022 04:26 AM

Simple solution in SQL:

proc sql;
create table want as
  select
    t1.id, t1.dt, t1.amount,
    (
      select sum(amount)
      from what_i_want t2
      where t1.id = t2.id and t1.dt le t2.dt le t1.dt + 2
    ) as sum_of_next_2_days
  from what_i_want t1
;
quit;

If this turns out to be non-feasible because of bad performance (in case of a large input dataset), we would need to think of a data step solution that uses a date-based array. This data step would then read the data in one sequential pass.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 01-28-2022 06:02 PM

Thanks for the code.

It works but as you mentioned it is not efficient with big data.

I can split the dataset and run it in smaller data.

But wondered if there is any other way.

Kurt_Bremser · Posted 01-31-2022 10:12 AM

See the array-based approach here:

data have;
input ID $ DT:date9. Amount;
format DT date9.;
datalines;
A 09JUL2021 3600
A 03AUG2021 456
A 04AUG2021 33
A 06AUG2021 235
A 07AUG2021 100
A 09AUG2021 86
A 12AUG2021 456
A 24AUG2021 22
A 25AUG2021 987
A 26AUG2021 916
A 27AUG2021 81
B 07AUG2021 554
B 08AUG2021 193
B 09AUG2021 245
;

%let start = %sysfunc(inputn(19000101,yymmdd8.));
%let end = %sysfunc(inputn(20991231,yymmdd8.));

data want;
array __amount {&start.:&end.};
do until (last.id);
  set have;
  by id dt;
  if first.id then __start = dt;
  __amount{dt} = amount;
end;
do dt = __start to dt;
  if __amount{dt} ne .
  then do;
    amount = __amount{dt};
    sum_of_next_2_days = sum(__amount{dt},__amount{dt+1},__amount{dt+2});
    output;
  end;
end;
drop __:;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 02-01-2022 03:40 AM

Thank you it works and it is really fast.
I guess I would have to learn more about arrays.
Thanks again.

Kurt_Bremser · Posted 02-01-2022 03:56 AM

While it may not be obvious at first, an array indexed by dates is not really that large (if you set reasonable start and end dates) and can easily fit into available memory.

Even if you span over all possible dates, a numeric array would be about 24 MB in size:

data _null_;
x = 8 * ('31dec9999'd - '01jan1582'd);
put x=;
run;

(make it temporary so that no individual variable names need to be kept, which would massively increase the memory consumption)

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 07-18-2023 02:26 PM

Hi, hope you are well.

It is a quite old post, but I wondered if you could help on the below.

How can this approach be modified in order to consider only consecutive days, i.e., days happen one after each other without breaks.

For example, if we choose to consider only 3 consecutive days, the data will be as below:

data have;
input ID $ DT:date9. Amount Concecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 .
A 03AUG2021 456 489
A 04AUG2021 33 .
A 06AUG2021 235 335
A 07AUG2021 100 .
A 09AUG2021 86 .
A 12AUG2021 456 .
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 .
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245
;

Any help would be much appreciated.

(in case it's better to create a new topic please let me know)

Kurt_Bremser · Posted 07-19-2023 02:47 AM

Do the summation only when the next day is non-missing.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Zatere · Posted 07-19-2023 03:45 AM

May I ask to help me on this? That was my initial thought too but I cannot make it..

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away