Solved: Re: Find date when cumulative sum is above threshold

nmp · Posted 03-01-2022 11:00 AM

I am trying to get the date when a cumulative sum exceeds a threshold (30 days here) by group, and only that row. I know I can add a cumulative sum output all the rows with > 30 and then sort and keep only the first row. But I am trying see if there is a way to do it in a single step.

data have;
input id date days ;
cards;
1 02DEC2014 3
1 01JAN2015 15
1 15JAN2015 15
2 02JAN2015 30
2 02FEB2015 30
2 01MAR2015 30
2 02APR2015 30
3 16DEC2014 3
3 30DEC2014 3
3 01JAN2015 3
3 16JAN2015 3
3 30JAN2015 3
3 01FEB2015 3
;

WANT:

1 15JAN2015
2 02FEB2015

Reeza · Posted 03-01-2022 11:28 AM

data want;
set have;
retain found;
by id date;
if first.id then do ;cum_days = days; found=0; end;
else cum_days + days;

if cum_days > 30 and found = 0 then do;
found=1;
output;
end;

run;

Add another flag variable to check if the value was already output and use RETAIN to keep it the same across rows.

@nmp wrote:

I am trying to get the date when a cumulative sum exceeds a threshold (30 days here) by group, and only that row. I know I can add a cumulative sum output all the rows with > 30 and then sort and keep only the first row. But I am trying see if there is a way to do it in a single step.

data have;
input id date days ;
cards;
1 02DEC2014 3
1 01JAN2015 15
1 15JAN2015 15
2 02JAN2015 30
2 02FEB2015 30
2 01MAR2015 30
2 02APR2015 30
3 16DEC2014 3
3 30DEC2014 3
3 01JAN2015 3
3 16JAN2015 3
3 30JAN2015 3
3 01FEB2015 3
;

WANT:

1 15JAN2015
2 02FEB2015

FYI - please test the code you post runs correctly, your posted code did not run correctly for me.

View solution in original post

PaigeMiller · Posted 03-01-2022 11:21 AM

Why does the WANT data set have only those two rows?

--
Paige Miller

nmp · Posted 03-01-2022 11:27 AM

Sorry, I just realized that my have data set may not have been clear. The days in the data set are not cumulative, so if I create a cumulative total it would look like the one below. The want data set has the dates when each person first had > 30 days. 1 has 33 on 15Jan2015, person 2 has 60 on 02Feb2015 and person 3 never has > 30, so doesn't show up in want.

data have;
input id visitdt days totdays;
cards;
1 02DEC2014 3 3
1 01JAN2015 15 18
1 15JAN2015 15 33
2 02JAN2015 30 30
2 02FEB2015 30 60
2 01MAR2015 30 90
2 02APR2015 30 120
3 16DEC2014 3 3
3 30DEC2014 3 6
3 01JAN2015 3 9
3 16JAN2015 3 12
3 30JAN2015 3 15
3 01FEB2015 3 18
;

Reeza · Posted 03-01-2022 11:28 AM

data want;
set have;
retain found;
by id date;
if first.id then do ;cum_days = days; found=0; end;
else cum_days + days;

if cum_days > 30 and found = 0 then do;
found=1;
output;
end;

run;

Add another flag variable to check if the value was already output and use RETAIN to keep it the same across rows.

@nmp wrote:

I am trying to get the date when a cumulative sum exceeds a threshold (30 days here) by group, and only that row. I know I can add a cumulative sum output all the rows with > 30 and then sort and keep only the first row. But I am trying see if there is a way to do it in a single step.

data have;
input id date days ;
cards;
1 02DEC2014 3
1 01JAN2015 15
1 15JAN2015 15
2 02JAN2015 30
2 02FEB2015 30
2 01MAR2015 30
2 02APR2015 30
3 16DEC2014 3
3 30DEC2014 3
3 01JAN2015 3
3 16JAN2015 3
3 30JAN2015 3
3 01FEB2015 3
;

WANT:

1 15JAN2015
2 02FEB2015

FYI - please test the code you post runs correctly, your posted code did not run correctly for me.

nmp · Posted 03-01-2022 11:49 AM

Thank you for this simple, straightforward solution.

I will check my code in the future.

Find date when cumulative sum is above threshold

Re: Find date when cumulative sum is above threshold

Re: Find date when cumulative sum is above threshold

Re: Find date when cumulative sum is above threshold

Re: Find date when cumulative sum is above threshold

Re: Find date when cumulative sum is above threshold

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away