Hello there,
I would like to calculate the cumulative amount based on sequential dates.
Please see below the data I have:
data have;
input ID $ DT:date9. Amount;
format DT date9.;
datalines;
A 09JUL2021 3600
A 03AUG2021 456
A 04AUG2021 33
A 06AUG2021 235
A 07AUG2021 100
A 09AUG2021 86
A 12AUG2021 456
A 24AUG2021 22
A 25AUG2021 987
A 26AUG2021 916
A 27AUG2021 81
;
run;
I want a variable that represents the cumulative amount of the sequential dates. E.g., 03AUG2021 & 04AUG2021
are sequential and therefore the cumulative amount is 489.
Please see below the data I want:
data want;
input ID $ DT:date9. Amount Variable_Wanted;
format DT date9.;
datalines;
A 09JUL2021 3600 0
A 03AUG2021 456 0
A 04AUG2021 33 489
A 06AUG2021 235 0
A 07AUG2021 100 335
A 09AUG2021 86 0
A 12AUG2021 456 0
A 24AUG2021 22 0
A 25AUG2021 987 1009
A 26AUG2021 916 1903
A 27AUG2021 81 997
;
run;
Ideally, the user should be able to select the number of sequential dates to include.
Please see below the closest I can make:
data wanted;
set have;
new_Amount = 0;
by ID DT;
if dif(DT) <= 1 then /*how many sequential*/
do;
new_Amount = Amount;
end;
run;
Any help would much be appreciated.
Thanks.
how about this code.
proc sort data=have;
by id dt;
run;
data want;
set have;
by id dt;
retain sum amt dt0;
if dt=sum(dt0,1) then sum=sum(amt,amount);
else sum=0;
amt=amount;
dt0=dt;
drop amt dt0;
run;
how about this code.
proc sort data=have;
by id dt;
run;
data want;
set have;
by id dt;
retain sum amt dt0;
if dt=sum(dt0,1) then sum=sum(amt,amount);
else sum=0;
amt=amount;
dt0=dt;
drop amt dt0;
run;
Your initial question (summing only two days) can be solved like this:
data want;
set have;
by id dt;
variable_wanted = ifn(
first.id or lag(dt) ne dt - 1,
0,
amount + lag(amount)
);
run;
A running total over any arbitrary number of consecutive days goes like this:
data want;
set have;
by id dt;
if first.id or lag(dt) ne dt - 1
then do;
variable_wanted = 0;
output;
variable_wanted = amount;
end;
else do;
variable_wanted + amount;
output;
end;
run;
Is this to be based on the same example data you have in your initial post?
And please don't forget to use code boxes for your code, otherwise it gets scrambled, as you can see 😉
Hello,
Below my data:
data what_I_want;
input ID $ DT:date9. Amount Sum_of_Next_2_Days;
format DT date9.;
datalines;
A 09JUL2021 3600 3600
A 03AUG2021 456 489
A 04AUG2021 33 268
A 06AUG2021 235 335
A 07AUG2021 100 186
A 09AUG2021 86 86
A 12AUG2021 456 456
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 1551
A 27AUG2021 81 828
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245 245
;
run;
Along the line I would like something like that.
For each date, I want to find the sum of its next 2 days. Find for next 2 dates: For 09Jul, sum Amount where DT Between 09Jul and 11Jul For 03Aug, sum Amount where DT Between 03Aug and 05Aug For 04Aug, sum Amount where DT Between 04Aug and 06Aug For 06Aug, sum Amount where DT Between 06Aug and 08Aug For 07Aug, sum Amount where DT Between 07Aug and 09Aug For 09Aug, sum Amount where DT Between 09Aug and 11Aug For 12Aug, sum Amount where DT Between 12Aug and 14Aug For 24Aug, sum Amount where DT Between 24Aug and 26Aug For 25Aug, sum Amount where DT Between 25Aug and 27Aug For 26Aug, sum Amount where DT Between 26Aug and 28Aug For 27Aug, sum Amount where DT Between 27Aug and 29Aug For 07Aug, sum Amount where DT Between 07Aug and 09Aug For 08Aug, sum Amount where DT Between 08Aug and 10Aug For 09Aug, sum Amount where DT Between 09Aug and 11Aug
Simple solution in SQL:
proc sql;
create table want as
select
t1.id, t1.dt, t1.amount,
(
select sum(amount)
from what_i_want t2
where t1.id = t2.id and t1.dt le t2.dt le t1.dt + 2
) as sum_of_next_2_days
from what_i_want t1
;
quit;
If this turns out to be non-feasible because of bad performance (in case of a large input dataset), we would need to think of a data step solution that uses a date-based array. This data step would then read the data in one sequential pass.
Thanks for the code.
It works but as you mentioned it is not efficient with big data.
I can split the dataset and run it in smaller data.
But wondered if there is any other way.
See the array-based approach here:
data have;
input ID $ DT:date9. Amount;
format DT date9.;
datalines;
A 09JUL2021 3600
A 03AUG2021 456
A 04AUG2021 33
A 06AUG2021 235
A 07AUG2021 100
A 09AUG2021 86
A 12AUG2021 456
A 24AUG2021 22
A 25AUG2021 987
A 26AUG2021 916
A 27AUG2021 81
B 07AUG2021 554
B 08AUG2021 193
B 09AUG2021 245
;
%let start = %sysfunc(inputn(19000101,yymmdd8.));
%let end = %sysfunc(inputn(20991231,yymmdd8.));
data want;
array __amount {&start.:&end.};
do until (last.id);
set have;
by id dt;
if first.id then __start = dt;
__amount{dt} = amount;
end;
do dt = __start to dt;
if __amount{dt} ne .
then do;
amount = __amount{dt};
sum_of_next_2_days = sum(__amount{dt},__amount{dt+1},__amount{dt+2});
output;
end;
end;
drop __:;
run;
While it may not be obvious at first, an array indexed by dates is not really that large (if you set reasonable start and end dates) and can easily fit into available memory.
Even if you span over all possible dates, a numeric array would be about 24 MB in size:
data _null_;
x = 8 * ('31dec9999'd - '01jan1582'd);
put x=;
run;
(make it temporary so that no individual variable names need to be kept, which would massively increase the memory consumption)
Hi, hope you are well.
It is a quite old post, but I wondered if you could help on the below.
How can this approach be modified in order to consider only consecutive days, i.e., days happen one after each other without breaks.
For example, if we choose to consider only 3 consecutive days, the data will be as below:
data have;
input ID $ DT:date9. Amount Concecutive_Amount;
format DT date9.;
datalines;
A 09JUL2021 3600 .
A 03AUG2021 456 489
A 04AUG2021 33 .
A 06AUG2021 235 335
A 07AUG2021 100 .
A 09AUG2021 86 .
A 12AUG2021 456 .
A 24AUG2021 22 1925
A 25AUG2021 987 1984
A 26AUG2021 916 997
A 27AUG2021 81 .
B 07AUG2021 554 992
B 08AUG2021 193 438
B 09AUG2021 245
;
Any help would be much appreciated.
(in case it's better to create a new topic please let me know)
Do the summation only when the next day is non-missing.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.