I have the following dataset, I need to count each id only once based on the highest order achieved into:
mono
dual
2 or more
(for example: if the same patient have drugs with no overlap, another two drugs overlapped= then count this patient one time under 2 drug overlap or dual) based on the presence of one of the two conditions:
overlap of 60 days or more between drugs or if the drugs overlap at two different time periods by 30 days then count them (for example one period they same two drugs overlap by 30 and in another period by 40 days count this id as dual)
id drug start end
1 a 1/1/2004 4/4/2004
1 b 2/2/2004 6/6/2004
1 d 1/4/2005 4/4/2005
2 a 3/1/2006 4/2/2006
2 b 2/2/2006 5/3/2006
2 c 2/2/2006 4/4/2006
2 d 2/3/2001 4/4/2001
3 a 3/3/2001 4/3/2001
3 b 3/2/2002 4/2/2002
4 a 6/1/2001 8/2/2001
4 b 6/1/2001 7/7/2001
4 a 2/2/2001 4/4/2001
4 b 2/5/2001 3/28/2001
the output would be
mono or one drug : 1 (patient 3 counted here)
dual or two drugs overlap : 2 (patient 2 and 4 would be counted here)
three or more (patient 1)
I don't need the actual drugs that overlap just a count of the frequency where each patient can be counted once only.
There are a totol of 6 drugs.
I don't have time right now to code anything, but a couple of pointers. You can retain an overlap counter by id, increment it based on your logic and start_date between lag(start_date) and lag(end_date). Then output only last.id. This would give you a list of ids and a count of overlaps which you could then count.
data have;
input id drug $ start :mmddyy10. end :mmddyy10.;
format start end mmddyy10.;
cards;
1 a 1/1/2004 4/4/2004
1 b 2/2/2004 6/6/2004
1 d 1/4/2005 4/4/2005
2 a 3/1/2006 4/2/2006
2 b 2/2/2006 5/3/2006
2 c 2/2/2006 4/4/2006
2 d 2/3/2001 4/4/2001
3 a 3/3/2001 4/3/2001
3 b 3/2/2002 4/2/2002
4 a 6/1/2001 8/2/2001
4 b 6/1/2001 7/7/2001
4 a 2/2/2001 4/4/2001
4 b 2/5/2001 3/28/2001
;
run;
data want1;
set have;
by id;
drugno = 1;
retain count;
if first.id then count = 1; else count + 1;
if lag(end) >= start and count > 1 then drugno = drugno + 1;
if lag2(end) >= start and count > 2 then drugno = drugno + 1;
drugno = min(drugno,3);
run;
proc sql;
create table want2 as select id, max(drugno) as drugno
from want1 group by id;
quit;
proc freq data=want2;
tables drugno;
run;
This detects simple overlaps. If you want the length of the overlaps factored in, you will have to create several drugcounts for different lengths and drugs, so you can then make your further calculations. This is just to show the use of the lag() functions.
Thank you KurtBremser, that is very helpful but can you provide a brief example on how to use it to account for the number of days? Thank you!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.