I have a data set of patients that start their medication on a certain date and end on a certain date. I would like to calculate the cumulative number of days that they were on medications given that some of them took more than one medication.
data have;
input ID $ medication start_date :mmddyy10. end_date :mmddyy10.;
datalines;
A001 1 8/26/2015 9/3/2015
A001 1 9/1/2015 9/17/2017
A002 1 10/2/2015 10/2/2015
A003 1 10/30/2015 10/30/2015
A003 2 11/24/2015 11/24/2015
A004 1 3/17/2016 3/25/2016
A004 2 3/24/2016 3/27/2016
A004 3 3/31/2016 3/31/2016
;
In this case, patient A003 took two medications without overlapping but patient A004 took 3 overlapping medications. How can I create a 5th column that lists the number of cummulative days that they've taken medications? The only idea I have is to create a flag that the patient is taking multiple drugs and that I should subtract the last end date from the first start date, but I wouldn't know how to get rid of the days that the patients didn't take medication in between.
Hi @serena13lee,
If you just need the cumulative number of days on any drug, you can keep the code simple (at the expense of several seconds of run time for an input dataset with, say, 500,000 observations from 50,000 patients with a mean duration of 10 years on drug):
/* Determine earliest start date and latest end date */
proc sql noprint;
select min(start_date), max(end_date)
into :firstdate, :lastdate
from have;
quit;
/* Compute cumulative number of days on drug */
data want(drop=_:);
array _d[&firstdate:&lastdate];
do until(last.id);
set have;
by id;
do _i=start_date to end_date;
_d[_i]=1;
end;
days=sum(of _d[*]);
output;
end;
label days='Cumulative number of days on any drug';
run;
Please see the screenshot, if this is what you wanted. There is column for CumulativeDaysPerMed (number of days on each medicine) and CumulativeDaysPerPatient (number of days for each patient for all medicines). If this is what you wanted then the code follows the image.
DATA WANT (DROP=CurrDay);
SET have;
FORMAT start_date mmddyy10. end_date mmddyy10. CurrDay 8. CumulativeDayPerMed 8.;
BY ID medication;
RETAIN CumulativeDayPerPatient 0;
RETAIN CumulativeDayPerMed 0;
IF LAST.medication AND LAST.ID THEN DO;
CurrDay = INTCK('day',start_date,end_date);
CumulativeDayPerMed + CurrDay;
CumulativeDayPerPatient + CumulativeDayPerMed;
OUTPUT;
CumulativeDayPerMed = 0;
CumulativeDayPerPatient = 0;
END;
ELSE DO;
CurrDay = INTCK('day',start_date,end_date);
CumulativeDayPerMed + CurrDay;
CumulativeDayPerPatient + CumulativeDayPerMed;
OUTPUT;
END;
RUN;
PROC PRINT DATA=Want;
RUN;
Please let me know if I was able to read the problem requirement correctly.
Thank you.
@serena13lee I am sorry, my previous post was not correct, for two reasons. Please see the code and the output as below.
The output of the modified code is below
DATA WANT (DROP=CurrDay);
SET have;
FORMAT start_date mmddyy10. end_date mmddyy10. CurrDay 8. CumulativeDayPerMed 8. CumulativeDayPerPatient 8.;
BY ID medication;
RETAIN CumulativeDayPerPatient 0;
RETAIN CumulativeDayPerMed 0;
IF LAST.medication AND NOT LAST.ID THEN DO;
CurrDay = INTCK('day',start_date,end_date)+1;
CumulativeDayPerMed + CurrDay;
CumulativeDayPerPatient + CumulativeDayPerMed;
OUTPUT;
CumulativeDayPerMed = 0;
END;
ELSE IF LAST.medication AND LAST.ID THEN DO;
CurrDay = INTCK('day',start_date,end_date)+1;
CumulativeDayPerMed + CurrDay;
CumulativeDayPerPatient + CumulativeDayPerMed;
OUTPUT;
CumulativeDayPerMed = 0;
CumulativeDayPerPatient = 0;
END;
ELSE DO;
CurrDay = INTCK('day',start_date,end_date)+1;
CumulativeDayPerMed + CurrDay;
CumulativeDayPerPatient + CumulativeDayPerMed;
OUTPUT;
END;
RUN;
PROC PRINT DATA=Want;
RUN;
Hopefully this will suit what you wanted.
Hello ,
Hope so below code will work for your requirement. Kindly let's know how you want to handle missing values.
data have;
input ID $ medication start_date :mmddyy10. end_date :mmddyy10.;
datalines;
A001 1 8/26/2015 9/3/2015
A001 1 9/1/2015 9/17/2017
A002 1 10/2/2015 10/2/2015
A003 1 10/30/2015 10/30/2015
A003 2 11/24/2015 11/24/2015
A004 1 3/17/2016 3/25/2016
A004 2 3/24/2016 3/27/2016
A004 3 3/31/2016 3/31/2016
;
run;
proc sql;
create table want as
select a.id,a.medication,a.start_date format=date9.,a.end_date format=date9. from have as a , have as b
where (a.id=b.id and a.medication<>b.medication and a.start_date between b.start_date and b.end_date)
or
(a.id=b.id and a.medication <> b.medication and b.start_date between a.start_date and a.end_date)
or
(a.id=b.id and a.medication <> b.medication and a.end_date between b.end_date and b.start_date)
or
(a.id=b.id and a.medication <> b.medication and b.end_date between a.end_date and a.start_date)
or
(a.id=b.id and a.medication <> b.medication and a.end_date=b.end_date and a.start_date=b.start_date)
or
(a.id=b.id and a.medication <> b.medication and b.end_date=a.end_date and b.start_date=a.start_date);
quit;
Thanks
🙂
Below code for how I understood your requirement.
data have;
input ID $ medication start_date :mmddyy10. end_date :mmddyy10.;
format start_date end_date date9.;
datalines;
A001 1 8/26/2015 9/3/2015
A001 1 9/1/2015 9/17/2017
A002 1 10/2/2015 10/2/2015
A003 1 10/30/2015 10/30/2015
A003 2 11/24/2015 11/24/2015
A004 1 3/17/2016 3/25/2016
A004 2 3/24/2016 3/27/2016
A004 3 3/31/2016 3/31/2016
;
proc sort data=have;
by id start_date;
run;
data
inter(drop=cluster_start_date cluster_end_date)
clusters(keep=id cluster_:);
set have;
by id start_date;
format cluster_id 16. cluster_start_date cluster_end_date date9.;
retain cluster_start_date cluster_end_date;
if first.id then
do;
cluster_id+1;
cluster_start_date=start_date;
cluster_end_date=end_date;
end;
else
if start_date<=cluster_end_date then
do;
cluster_end_date=max(cluster_end_date, end_date);
end;
else
do;
output clusters;
cluster_id+1;
cluster_start_date=start_date;
cluster_end_date=end_date;
end;
if last.id then output clusters;
output inter;
run;
data want;
merge inter clusters;
by id cluster_id;
retain days_of_med_in_cluster days_of_med_cum;
if first.cluster_id then
do;
days_of_med_in_cluster= cluster_end_date-cluster_start_date+1;
if first.id then days_of_med_cum=days_of_med_in_cluster;
else days_of_med_cum=sum(days_of_med_cum,days_of_med_in_cluster);
end;
run;
proc print data=want;
run;
Hi @serena13lee,
If you just need the cumulative number of days on any drug, you can keep the code simple (at the expense of several seconds of run time for an input dataset with, say, 500,000 observations from 50,000 patients with a mean duration of 10 years on drug):
/* Determine earliest start date and latest end date */
proc sql noprint;
select min(start_date), max(end_date)
into :firstdate, :lastdate
from have;
quit;
/* Compute cumulative number of days on drug */
data want(drop=_:);
array _d[&firstdate:&lastdate];
do until(last.id);
set have;
by id;
do _i=start_date to end_date;
_d[_i]=1;
end;
days=sum(of _d[*]);
output;
end;
label days='Cumulative number of days on any drug';
run;
I more than second the approach (which I personally call "paint brushing" and routinely use in similar situations, particularly for calculating what is called "continuous enrollment" in the insurance industry). Some things I'd do differently, though:
Kind regards
Paul D.
Thank you very much, Paul, for your detailed comments.
@FreelanceReinh: Thank you for continuing the interesting conversation.
"The implied RETAIN of the sum statement (+1) would have required the explicit initialization of the array (for what I assumed was the intended result), which I wanted to avoid."
Very true. Though it can also be avoided by using _d[_i]=sum(_d[_i],1) instead - a usual subterfuge in this kind of situation.
On a different note, a temp array can be reinitialized extremely rapidly to any value using CALL FILLMATRIX compiled into a custom call routine via FCMP - with the limitations that (a) the array is 1-based and (b) numeric. I've posted a way of doing it somewhere in this space not a very long time ago (though, as if often happens with pros, given the hint, it would take less time for you to code it yourself than to find my code).
Kind regards
Paul D.
@hashman wrote:
@FreelanceReinh: Thank you for continuing the interesting conversation.
My pleasure!
... the explicit initialization of the array (...) can also be avoided by using _d[_i]=sum(_d[_i],1) instead - a usual subterfuge in this kind of situation.
Yes, sure. I think I've learned this technique about ten years ago from your (and Koen Vyverman's) inspiring 2009 paper "The DOW-Loop Unrolled" and I've used it many times.
@hashman wrote:On a different note, a temp array can be reinitialized extremely rapidly to any value using CALL FILLMATRIX ....
I've posted a way of doing it somewhere in this space not a very long time ago ....
Correct. This was when we discussed my CALL STDIZE approach in Set All Array Values to 0. I'm looking forward to using this and other non-IML matrix functions and CALL routines. Thanks again.
Ah, yeah. In 2008 Koen and I had reckoned that the DoW deserved its own dedicated paper after having been merely a section of "The Magnificent DO" since SESUG 2000; and so we conspired to write a paper and present it using the DATA step debugger rather than PP slides, with Koen at the keyboard and me talking and pointing at the screen. Perhaps it would've been better the other way around since Koen, fluent in 7 languages, speaks English better, not to mention that his Flemish accent is less intrusive than my Ukrainian. At any rate, I'm glad we did it to encapsulate the DoW's functionality in a paper devoted to it and nothing else.
Thank you for reminding of the CALL STDIZE thread. Your associative memory is obviously much better than mine. Another thing I must mention is that it was Roger DeAngelis who brought your method to my attention on SAS-L, which in turn spurned me to compare it with the APP and CALL FILLMATRIX. It's pretty incredible how much we can learn from each other here.
Kind regards
Paul D.
@hashman wrote:
we ... present it using the DATA step debugger rather than PP slides, with Koen at the keyboard and me talking and pointing at the screen.
Nice idea!
I must mention ... that it was Roger DeAngelis who brought your method to my attention on SAS-L, ...
I didn't know that. Thanks. A quick Google search ("De Angelis" "CALL STDIZE") brought up this 13 days old GitHub entry: https://github.com/rogerjdeangelis/utl-set-each-element-of-an-array-to-zero-or-any-constant. I'm amazed how quickly content from this forum is being spread.
Roger has been doing an amazing job over recent years compiling best solutions of data processing problems from both SAS community and SAS-L and comparing their SAS/R/Python solutions (and their hybrids thereof) on github. A truly wondrous wealth of info there, not to mention that for every problem, Roger goes to great lengths to reformulate it in strikingly lucid terms, replete with HAVE, WANT, and clearly stated processing rules in his own inimitable style.
Kind regards
Paul D.
Hi! Thank you for your solution. I am currently fitting you code to my data and it looks great. Unfortunately, I have a criteria which I'm not sure how to adapt your code to and was wondering if there might be a simple fix or if I should ask the community. So the criteria I had was performing the cumulative number of days on medication for patients that took medications for >=15 consecutive days . So in the case of patient A004, they took medication for 8 days, then 3 days, then 1 day (11 days total). Since they didn't take medication for >=15 consecutive days, I won't calculate how many days they took medication for. Sorry if this is confusing. Please let me know if you would like an example. Thanks.
Hi @serena13lee,
Thanks for accepting my solution.
@serena13lee wrote:
(...) So in the case of patient A004, they took medication for 8 days, then 3 days, then 1 day (11 days total).
These counts do not exactly match those produced by my solution: 9 days, then 4 (of which 2 are overlapping), then 1 day (12 days total). If your counting rule was days=end_date-start_date (which is not uncommon) rather than days=end_date-start_date+1, then the last record of your sample data would be inconsistent (as resulting in "0 days").
That said, my code creates an array _d whose elements correspond to the calendar days between the earliest start date and the latest end date. Assuming that the array is populated appropriately, the array element for the i-th date in this time period indicates that the patient took some medication on that day if _d[i]=1. Otherwise, the array element contains a numeric missing value: _d[i]=.. So, a period of 15 consecutive days on drug means a sequence of 15 consecutive 1s somewhere in the array. You may want to create a flag variable, say, consec15d which is set to 1 if such a 15-day sequence exists and which is set to 0 otherwise. To create this flag variable, insert the definition between days=...; and the output statement:
if ~consec15d then consec15d=find(cats(of _d[*]),repeat('1',14))>0;
(Note that repeat('1',14)='111111111111111' with 15 digits.) I would assign a label to this variable such as '0-1 flag for 15 consecutive days on drug'.
The flag will be set to 1 in the first record where the 15-days condition is met and then left unchanged (due to the IF condition, which I've included for performance reasons) because the maximum number of consecutive days on drug can never decrease within an ID, assuming that the records are sorted chronologically. Since CATS creates a character string of the length &lastdate-&firstdate+1, I would be hesitant to apply this technique if that length exceeded 32767 characters, but even a time span of 89 years wouldn't reach this value.
I leave it to you to alter the value of variable DAYS (if necessary) in cases where CONSEC15D=0.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.