BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Zatere
Quartz | Level 8

Hello there,

 

I would like to calculate the cumulative amount based on sequential dates.

 

Please see below the data I have:

 

data have;
input ID $ DT:date9. Amount;
format DT date9.;
datalines;
A 09JUL2021 3600
A 03AUG2021 456
A 04AUG2021 33
A 06AUG2021 235
A 07AUG2021 100
A 09AUG2021 86
A 12AUG2021 456
A 24AUG2021 22
A 25AUG2021 987
A 26AUG2021 916
A 27AUG2021 81
;
run;

I want a variable that represents the cumulative amount of the sequential dates. E.g., 03AUG2021 & 04AUG2021

are sequential and therefore the cumulative amount is  489.

 

Please see below the data I want:

 

data want;
input ID $ DT:date9. Amount Variable_Wanted;
format DT date9.;
datalines;
A 09JUL2021 3600 0
A 03AUG2021 456 0
A 04AUG2021 33 489
A 06AUG2021 235 0
A 07AUG2021 100 335
A 09AUG2021 86 0
A 12AUG2021 456 0
A 24AUG2021 22 0
A 25AUG2021 987 1009
A 26AUG2021 916 1903
A 27AUG2021 81 997
;
run;

Ideally, the user should be able to select the number of sequential dates to include. 

 

Please see below the closest I can make:

 

data wanted;
	set have;
	new_Amount = 0;
	by ID DT;
	if dif(DT) <= 1 then /*how many sequential*/
		do;
			new_Amount = Amount;
		end;
run;

Any help would much be appreciated.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
japelin
Rhodochrosite | Level 12

how about this code.

proc sort data=have;
  by id dt;
run;

data want;
  set have;
  by id dt;
  retain sum amt dt0;
  if dt=sum(dt0,1) then sum=sum(amt,amount);
  else sum=0;
  amt=amount;
  dt0=dt;
  drop amt dt0;
run;

 

View solution in original post

16 REPLIES 16
japelin
Rhodochrosite | Level 12

how about this code.

proc sort data=have;
  by id dt;
run;

data want;
  set have;
  by id dt;
  retain sum amt dt0;
  if dt=sum(dt0,1) then sum=sum(amt,amount);
  else sum=0;
  amt=amount;
  dt0=dt;
  drop amt dt0;
run;

 

Zatere
Quartz | Level 8
Hi, I didnt consider everyting and I am wondering if I could ask some additinal help. As is, the program calculates the cumulative amount only for 1 sequential dates. For instance, 03AUG2021 & 04AUG2021 which is 489. I also wanted to be able to calculate the cumulative amount for more than 1 sequential dates. For example: - for two sequential dates: 04AUG2021 & 06AUG2021 = 268 - for three sequential dates: 25AUG2021 & 26AUG2021 & 27AUG2021 = 3909. I tried to change the dt=sum(dt0,2), but it doesnt work because it will not consider any dates between, meaning it will calculate the cumulative amount for only the pairs: 04AUG2021,06AUG2021 and 07AUG2021,09AUG2021. Any help please?
Kurt_Bremser
Super User

Your initial question (summing only two days) can be solved like this:

data want;
set have;
by id dt;
variable_wanted = ifn(
  first.id or lag(dt) ne dt - 1,
  0,
  amount + lag(amount)
);
run;

A running total over any arbitrary number of consecutive days goes like this:

data want;
set have;
by id dt;
if first.id or lag(dt) ne dt - 1
then do;
  variable_wanted = 0;
  output;
  variable_wanted = amount;
end;
else do;
  variable_wanted + amount;
  output;
end;
run;
Zatere
Quartz | Level 8
Hi, thank you so much for your reply. It is not exaclty what I wanted. How can I get the sum of the next, lets say, 2 days of each day? This is what I was looking for (when I want to find the sum of 2 days in a row): data what_I_want; input ID $ DT:date9. Amount Sum_of_Next_2_Days; format DT date9.; datalines; A 09JUL2021 3600 0 A 03AUG2021 456 0 A 04AUG2021 33 489 A 06AUG2021 235 268 A 07AUG2021 100 335 A 09AUG2021 86 186 A 12AUG2021 456 0 A 24AUG2021 22 0 A 25AUG2021 987 0 A 26AUG2021 916 1925 A 27AUG2021 81 1984 ; run; It should work similarly if I want to find the sum of 3 days in a row and so on. Thanks.
Kurt_Bremser
Super User

Is this to be based on the same example data you have in your initial post?

 

And please don't forget to use code boxes for your code, otherwise it gets scrambled, as you can see 😉

Zatere
Quartz | Level 8

Hello, 

 

Below my data:

 

data what_I_want; 
input ID $ DT:date9. Amount Sum_of_Next_2_Days; 
format DT date9.; 
datalines; 
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 268
A 06AUG2021 235 335
A 07AUG2021 100 186
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 1551
A 27AUG2021 81 828
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
; 
run; 

Along the line I would like something like that. 

For each date, I want to find the sum of its next 2 days.					
Find for next 2 dates:					
For 09Jul, sum Amount where DT Between 09Jul and 11Jul					
For 03Aug, sum Amount where DT Between 03Aug and 05Aug					
For 04Aug, sum Amount where DT Between 04Aug and 06Aug					
For 06Aug, sum Amount where DT Between 06Aug and 08Aug					
For 07Aug, sum Amount where DT Between 07Aug and 09Aug					
For 09Aug, sum Amount where DT Between 09Aug and 11Aug					
For 12Aug, sum Amount where DT Between 12Aug and 14Aug					
For 24Aug, sum Amount where DT Between 24Aug and 26Aug					
For 25Aug, sum Amount where DT Between 25Aug and 27Aug					
For 26Aug, sum Amount where DT Between 26Aug and 28Aug					
For 27Aug, sum Amount where DT Between 27Aug and 29Aug					
For 07Aug, sum Amount where DT Between 07Aug and 09Aug					
For 08Aug, sum Amount where DT Between 08Aug and 10Aug					
For 09Aug, sum Amount where DT Between 09Aug and 11Aug					

 

 

Kurt_Bremser
Super User

Simple solution in SQL:

proc sql;
create table want as
  select
    t1.id, t1.dt, t1.amount,
    (
      select sum(amount)
      from what_i_want t2
      where t1.id = t2.id and t1.dt le t2.dt le t1.dt + 2
    ) as sum_of_next_2_days
  from what_i_want t1
;
quit;

If this turns out to be non-feasible because of bad performance (in case of a large input dataset), we would need to think of a data step solution that uses a date-based array. This data step would then read the data in one sequential pass.

Zatere
Quartz | Level 8

Thanks for the code.

It works but as you mentioned it is not efficient with big data.

I can split the dataset and run it in smaller data.

But wondered if there is any other way.

Kurt_Bremser
Super User

See the array-based approach here:

data have;
input ID $ DT:date9. Amount;
format DT date9.;
datalines;
A 09JUL2021 3600
A 03AUG2021 456
A 04AUG2021 33
A 06AUG2021 235
A 07AUG2021 100
A 09AUG2021 86
A 12AUG2021 456
A 24AUG2021 22
A 25AUG2021 987
A 26AUG2021 916
A 27AUG2021 81
B 07AUG2021 554
B 08AUG2021 193
B 09AUG2021 245
;

%let start = %sysfunc(inputn(19000101,yymmdd8.));
%let end = %sysfunc(inputn(20991231,yymmdd8.));

data want;
array __amount {&start.:&end.};
do until (last.id);
  set have;
  by id dt;
  if first.id then __start = dt;
  __amount{dt} = amount;
end;
do dt = __start to dt;
  if __amount{dt} ne .
  then do;
    amount = __amount{dt};
    sum_of_next_2_days = sum(__amount{dt},__amount{dt+1},__amount{dt+2});
    output;
  end;
end;
drop __:;
run;
Zatere
Quartz | Level 8
Thank you it works and it is really fast.
I guess I would have to learn more about arrays.
Thanks again.
Kurt_Bremser
Super User

While it may not be obvious at first, an array indexed by dates is not really that large (if you set reasonable start and end dates) and can easily fit into available memory.

Even if you span over all possible dates, a numeric array would be about 24 MB in size:

data _null_;
x = 8 * ('31dec9999'd - '01jan1582'd);
put x=;
run;

(make it temporary so that no individual variable names need to be kept, which would massively increase the memory consumption)

Zatere
Quartz | Level 8

Hi, hope you are well.

 

It is a quite old post, but I wondered if you could help on the below.

 

How can this approach be modified in order to consider only consecutive days, i.e., days happen one after each other without breaks.

 

For example, if we choose to consider only 3 consecutive days, the data will be as below:

 

data have;
input ID $ DT:date9. Amount Concecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 .
A 03AUG2021 456 489
A 04AUG2021 33 .
A 06AUG2021 235 335
A 07AUG2021 100 .
A 09AUG2021 86 .
A 12AUG2021 456 .
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 .
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245
;

Any help would be much appreciated.

(in case it's better to create a new topic please let me know)

 

 

Zatere
Quartz | Level 8
May I ask to help me on this? That was my initial thought too but I cannot make it..

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1841 views
  • 2 likes
  • 4 in conversation