drug start end
a 01/01/2001 02/15/2001
b 01/02/2001 02/10/2001
c 04/05/2002 06/07/2002
d 04/15/2002 07/01/2002
e 04/01/2002 07/18/2002
f 04/05/2002 06/01/2002
I want to report drug that overlapped by 30 days or more.
Output:
a-b
c-d-e-f
OK. So you also care about END-START ? data have; input id drug $ start : mmddyy10. end : mmddyy10.; format start end mmddyy10.; cards; 1 c 04/05/2002 06/07/2002 1 d 04/15/2002 04/25/2002 1 e 04/15/2002 04/18/2002 2 a 01/01/2002 02/15/2002 2 b 01/05/2002 03/19/2002 2 c 01/24/2002 02/17/2002 2 d 04/10/2002 07/01/2002 2 e 04/01/2002 07/28/2002 2 f 04/25/2002 06/30/2002 ; run; data temp; set have; by id; dif=lag(end)-start; if dif lt 30 or (end-start) lt 30 or first.id then group+1; run; data want; length want $ 200; do i=1 by 1 until(last.group); set Temp; by id group; want=catx('-',want,drug); end; if i ne 1 then output; keep id want; run;
There could be many interpretations to your sample data. To help clarify things, what would you expect for the data:
a 2002/01/01 2002/01/31 b 2002/01/01 2002/12/31 c 2002/12/01 2002/12/31 d 2002/06/01 2003/06/30
Ignoring patients ids, I am just trying to get aggregate numbers over a large data. I want to know the total number of prescriptions that overlap for 30 days or more.
If two medications overlap, then 2
if three medications overlap 3.
The output above shows the classes that overlaped. so for 2 a-b and so on
data have; input drug $ start : mmddyy10. end : mmddyy10.; format start end mmddyy10.; cards; a 01/01/2001 02/15/2001 b 01/02/2001 02/10/2001 c 04/05/2002 06/07/2002 d 04/15/2002 07/01/2002 e 04/01/2002 07/18/2002 f 04/05/2002 06/01/2002 ; run; data have; set have; dif=lag(end)-start; if dif lt 30 then group+1; run; data want; length want $ 200; do i=1 by 1 until(last.group); set have; by group; want=catx('-',want,drug); end; if i ne 1 then output; keep want; run;
KSHARP thank you! your code works perfect for the small dataset that has multiple medications for the same patient. Can one adjust the code if you have multiple ids? How can one modify the code if you have something like this with the first column be pt id?
1 a 01/01/2001 02/15/2001 1 b 01/02/2001 02/10/2001 1 c 04/05/2002 06/07/2002 1 d 04/15/2002 07/01/2002 1 e 04/01/2002 07/18/2002 1 f 04/05/2002 06/01/2002
2 a 01/01/2001 02/15/2001 2 b 01/02/2001 02/10/2001 2 c 04/05/2002 06/07/2002 2 d 04/15/2002 07/01/2002 2 e 04/01/2002 07/18/2002 2 f 04/05/2002 06/01/2002
Perhaps something like this if you want the groups within id, which makes more sense than across id.
data have;
input id drug $ start : mmddyy10. end : mmddyy10.;
format start end mmddyy10.;
cards;
1 a 01/01/2001 02/15/2001
1 b 01/02/2001 02/10/2001
1 c 04/05/2002 06/07/2002
1 d 04/15/2002 07/01/2002
1 e 04/01/2002 07/18/2002
1 f 04/05/2002 06/01/2002
2 a 01/01/2002 02/15/2002
2 b 01/05/2002 03/19/2002
2 c 01/24/2002 02/17/2002
2 d 04/10/2002 07/01/2002
2 e 04/01/2002 07/28/2002
2 f 04/25/2002 06/30/2002
;
run;
data temp;
set have;
by id;
dif=lag(end)-start;
if first.id then do;
group=.;
dif =.;
end;
if dif lt 30 then group+1;
run;
data want;
length want $ 200;
do i=1 by 1 until(last.group);
set Temp;
by id group;
want=catx('-',want,drug);
end;
if i ne 1 then output;
keep id want;
run;
I changed some date values for id=2 so that the results differed between Id. You need to decide if the rules have been applied correctly.
If your main data isn't sorted by ID and start date it should be before doing the step that creates temp.
Super! Thank you ballardw and ksharp!
I have one more questions.
Lets assume the output for id is the following:
1 a-b
1 a-b-c
1 a-b-c-d
1 a-b-c-e
2 a-b
2-c-d
How can I create table the keeps only the highest combination for each id. I mean here we have a combintion of two drugs, three drugs, and four drugs.
I want to create a table that tell me the highest combination for patient 1 was 4 and for patient 2 is 2 and so on. Here is the output
1 4
2 2
data x; input a b $20.; cards; 1 a-b 1 a-b-c 1 a-b-c-d 1 a-b-c-e 2 a-b 2 c-d ; run; proc sql; select a,max(countc(b,'-'))+1 as max from x group by a; quit;
Yeah. Try @ballardw 's code , or this one: data have; input id drug $ start : mmddyy10. end : mmddyy10.; format start end mmddyy10.; cards; 1 a 01/01/2001 02/15/2001 1 b 01/02/2001 02/10/2001 1 c 04/05/2002 06/07/2002 1 d 04/15/2002 07/01/2002 1 e 04/01/2002 07/18/2002 1 f 04/05/2002 06/01/2002 2 a 01/01/2002 02/15/2002 2 b 01/05/2002 03/19/2002 2 c 01/24/2002 02/17/2002 2 d 04/10/2002 07/01/2002 2 e 04/01/2002 07/28/2002 2 f 04/25/2002 06/30/2002 ; run; data temp; set have; by id; dif=lag(end)-start; if dif lt 30 or first.id then group+1; run; data want; length want $ 200; do i=1 by 1 until(last.group); set Temp; by id group; want=catx('-',want,drug); end; if i ne 1 then output; keep id want; run;
Thank you ksharp, I just realized that the output of the first code is incorrect . The 30 days overlap should be 30 days or more and should his card drugs that overlap in less than 30 days. For example, for the entries below, the code should not retain any drugs for this patient because there is no overlap of 30 days or more between the different drugs.
1 c 04/05/2002 06/07/2002 1 d 04/15/2002 04/25/2002 1 e 04/15/2002 04/18/2002
OK. So you also care about END-START ? data have; input id drug $ start : mmddyy10. end : mmddyy10.; format start end mmddyy10.; cards; 1 c 04/05/2002 06/07/2002 1 d 04/15/2002 04/25/2002 1 e 04/15/2002 04/18/2002 2 a 01/01/2002 02/15/2002 2 b 01/05/2002 03/19/2002 2 c 01/24/2002 02/17/2002 2 d 04/10/2002 07/01/2002 2 e 04/01/2002 07/28/2002 2 f 04/25/2002 06/30/2002 ; run; data temp; set have; by id; dif=lag(end)-start; if dif lt 30 or (end-start) lt 30 or first.id then group+1; run; data want; length want $ 200; do i=1 by 1 until(last.group); set Temp; by id group; want=catx('-',want,drug); end; if i ne 1 then output; keep id want; run;
Thank you so much Ksharp!! five starts!
one more question, after the first code, how can I exclude drugs that overlap for the same drug. For example if I have the following outpit:
a-b-c-a, I want to drop the duplicate a and keep it as a-b-c
the same for a-a, I dont want to count the overlap if it is the same drug . Thank you again for all the help!
No problem. data have; input id drug $ start : mmddyy10. end : mmddyy10.; format start end mmddyy10.; cards; 1 c 04/05/2002 06/07/2002 1 d 04/15/2002 04/25/2002 1 e 04/15/2002 04/18/2002 2 a 01/01/2002 02/15/2002 2 b 01/05/2002 03/19/2002 2 c 01/24/2002 02/17/2002 2 d 04/10/2002 07/01/2002 2 e 04/01/2002 07/28/2002 2 f 04/25/2002 06/30/2002 2 f 04/25/2002 07/30/2002 ; run; data temp; set have; by id; dif=lag(end)-start; if dif lt 30 or (end-start) lt 30 or first.id then group+1; run; data want; array x{99999} $ 100 _TEMPORARY_; n=0; call missing(of x{*}); length want $ 200; do i=1 by 1 until(last.group); set Temp; by id group; if drug not in x then do; n+1;x{n}=drug; want=catx('-',want,drug); end; end; if i ne 1 then output; keep id want; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.