I am trying to get the date when a cumulative sum exceeds a threshold (30 days here) by group, and only that row. I know I can add a cumulative sum output all the rows with > 30 and then sort and keep only the first row. But I am trying see if there is a way to do it in a single step.
data have;
input id date days ;
cards;
1 02DEC2014 3
1 01JAN2015 15
1 15JAN2015 15
2 02JAN2015 30
2 02FEB2015 30
2 01MAR2015 30
2 02APR2015 30
3 16DEC2014 3
3 30DEC2014 3
3 01JAN2015 3
3 16JAN2015 3
3 30JAN2015 3
3 01FEB2015 3
;
WANT:
1 15JAN2015
2 02FEB2015
data want;
set have;
retain found;
by id date;
if first.id then do ;cum_days = days; found=0; end;
else cum_days + days;
if cum_days > 30 and found = 0 then do;
found=1;
output;
end;
run;
Add another flag variable to check if the value was already output and use RETAIN to keep it the same across rows.
@nmp wrote:
I am trying to get the date when a cumulative sum exceeds a threshold (30 days here) by group, and only that row. I know I can add a cumulative sum output all the rows with > 30 and then sort and keep only the first row. But I am trying see if there is a way to do it in a single step.
data have;
input id date days ;
cards;
1 02DEC2014 3
1 01JAN2015 15
1 15JAN2015 15
2 02JAN2015 30
2 02FEB2015 30
2 01MAR2015 30
2 02APR2015 30
3 16DEC2014 3
3 30DEC2014 3
3 01JAN2015 3
3 16JAN2015 3
3 30JAN2015 3
3 01FEB2015 3
;WANT:
1 15JAN2015
2 02FEB2015
FYI - please test the code you post runs correctly, your posted code did not run correctly for me.
Why does the WANT data set have only those two rows?
Sorry, I just realized that my have data set may not have been clear. The days in the data set are not cumulative, so if I create a cumulative total it would look like the one below. The want data set has the dates when each person first had > 30 days. 1 has 33 on 15Jan2015, person 2 has 60 on 02Feb2015 and person 3 never has > 30, so doesn't show up in want.
data have;
input id visitdt days totdays;
cards;
1 02DEC2014 3 3
1 01JAN2015 15 18
1 15JAN2015 15 33
2 02JAN2015 30 30
2 02FEB2015 30 60
2 01MAR2015 30 90
2 02APR2015 30 120
3 16DEC2014 3 3
3 30DEC2014 3 6
3 01JAN2015 3 9
3 16JAN2015 3 12
3 30JAN2015 3 15
3 01FEB2015 3 18
;
data want;
set have;
retain found;
by id date;
if first.id then do ;cum_days = days; found=0; end;
else cum_days + days;
if cum_days > 30 and found = 0 then do;
found=1;
output;
end;
run;
Add another flag variable to check if the value was already output and use RETAIN to keep it the same across rows.
@nmp wrote:
I am trying to get the date when a cumulative sum exceeds a threshold (30 days here) by group, and only that row. I know I can add a cumulative sum output all the rows with > 30 and then sort and keep only the first row. But I am trying see if there is a way to do it in a single step.
data have;
input id date days ;
cards;
1 02DEC2014 3
1 01JAN2015 15
1 15JAN2015 15
2 02JAN2015 30
2 02FEB2015 30
2 01MAR2015 30
2 02APR2015 30
3 16DEC2014 3
3 30DEC2014 3
3 01JAN2015 3
3 16JAN2015 3
3 30JAN2015 3
3 01FEB2015 3
;WANT:
1 15JAN2015
2 02FEB2015
FYI - please test the code you post runs correctly, your posted code did not run correctly for me.
Thank you for this simple, straightforward solution.
I will check my code in the future.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.