The following is an example of the dataset 'amounts' I have:
startdate | expirydate | amount |
1/1/2010 | 2/2/2020 | 10 |
1/5/2011 | 2/3/2015 | 10 |
1/3/2012 | 2/2/2023 | 10 |
1/4/2013 | 2/2/2014 | 10 |
5/5/2015 | 2/2/2028 | 10 |
1/6/2016 | 2/2/2032 | 10 |
I want to calculate the sum of all existing amounts as of each start date so it should look like this:
startdate | amount |
1/1/2010 | 10 |
1/5/2011 | 20 |
1/3/2012 | 30 |
1/4/2013 | 40 |
5/5/2015 | 30 |
1/6/2016 | 40 |
How do I do this in SAS?
In the output data set, please explain how the AMOUNT is 40 on the fourth line and 30 on the fifth line.
The input of 1/5/2015 in the first table was actually an error which I've changed to 5/5/2015. Essentially what I want to do is for each of the start dates, calculate the cumulative sum of any amounts that haven't expired. So for the first four dates, it is just a running cumulative sum because none of the amounts have expired. But at 5/5/2015, two of the previous amounts have expired hence a cumulative sum of 30. Same for the last date, where the same two have previously expired and you have the additional amount as of 1/6/2016 therefore 40.
Here's a data step solution.
data want;
set have;
active_cum_sum = 0;
do _i = 1 to _n_;
set have (keep=expirydate amount rename=(expirydate=_expdt2 amount=_amt2)) point=_i;
if startdate < _expdt2 then active_cum_sum + _amt2;
end;
drop _:;
run;
Hi @analyst97725 A perfect use case for a student to learn SQL self joins -
data have ;
input (startdate expirydate) (:mmddyy10.) amount ;
format startdate expirydate date9.;
cards ;
1/1/2010 2/2/2020 10
1/5/2011 2/3/2015 10
1/3/2012 2/2/2023 10
1/4/2013 2/2/2014 10
5/5/2015 2/2/2028 10
1/6/2016 2/2/2032 10
;
proc sql ;
create table want as
select a.startdate, a.expirydate ,
sum((b.expirydate > a.startdate) * b.amount) as amount
from have a
left join have b
on a.startdate >= b.startdate
group by a.startdate, a.expirydate ;
quit ;
proc print noobs ;run ;
startdate | expirydate | amount |
---|---|---|
01JAN2010 | 02FEB2020 | 10 |
05JAN2011 | 03FEB2015 | 20 |
03JAN2012 | 02FEB2023 | 30 |
04JAN2013 | 02FEB2014 | 40 |
05MAY2015 | 02FEB2028 | 30 |
06JAN2016 | 02FEB2032 | 40 |
I usually try to avoid sorting datasets to reduce disk activity, but this might be a worthy exception:
data have;
input (startdate expirydate) (:mmddyy10.) amount ;
format startdate expirydate date9.;
cards ;
1/1/2010 2/2/2020 10
1/5/2011 2/3/2015 10
1/3/2012 2/2/2023 10
1/4/2013 2/2/2014 10
5/5/2015 2/2/2028 10
1/6/2016 2/2/2032 10
run;
proc sort data=have (keep=expirydate amount) out=expiration;
by expirydate;
run;
data want;
set expiration (rename=(expirydate=startdate) in=inexp) have (in=inhave);
by startdate;
cum_amount + ifn(inexp,-1*amount,amount);
if inhave;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.