Hi everyone, I've done quite a bit of work in SAS with dates and drug overlap, but this one has me stumped. I am looking at combinations of drugs. Each combination includes a main drug, M, and then there are three other drugs - A, E, and R - that can be combined. Each person will be on drug M the entire time, but the additional drugs can be combined in many different ways. Currently, the data is setup so that I have several rows per patient. Each row lists the drug regimen along with its start and end dates. However, some of them overlap (e.g. M+E will overlap partially with M+E+R, but there could be unique M+E time prior to/after the M+E+R time). What I want are the order and lengths of time that patients were on unique drug regimens. A picture is probably helpful here: (Note: I don't have the boundaries demarcated by the black dotted lines in the data, I just put them in that part of the diagram to make it easier to visualize). Initially, I tried putting the regimens in order of start date (and end date so that if they started on the same date, the one that ended first would come first) and used lag to look at when the previous segment started/ended. I then reversed it and used lag to find when the next regimen started/ended. However, that didn't give me all of the info I needed. For the example I posted above, if I'm just looking at previous and next drug, it would look like there is a unique period of M+E after the M+E+R period ends. However, I need to look a bit further down to discover that there is no unique M+E time because drug A is added to the mix. It seems like every time I look at another pattern in the data, another complicated situation comes up. For example, someone could take M+E alone for 2 months, then add A for 2 months, go back down to just M+E for another month, add R for 2 months, and then go back to M+E. They would have 5 distinct periods: M+E, M+E+A, M+E, M+E+R, and M+E. That's another complication - each patient could have anywhere from 1 period (e.g. they only took M+R for 3 months and that's it, easy) up to...I think the most I've seen looking through it is 7, but I haven't done the math to determine the absolute maximum. A coworker suggested creating an array to look at what drug is being used at any given time. However, this data spans 2006-2014, and the person I'm working with is interested in knowing how many days they were on each regimen. That would be over 3,000 days with over 7,000 patients in the dataset! I suppose if that's the best way to do it, I can, but it would be great to find another way. Also, there is a very wide variety in the amount of time people spend on these drugs. The minimum is 28 days and the maximum is several years. Here's a fake dataset I made to illustrate some of the trickier cases that I've come across: data drugs;
infile datalines;
input id:$1. regimen:$7. start:MMDDYY10. end:MMDDYY10.;
format start MMDDYY10. end MMDDYY10.;
datalines;
A M+E 6/1/2010 11/15/2010
A M+E+R 6/1/2010 9/1/2010
A M+E+R+A 8/1/2010 9/1/2010
A M+A 8/1/2010 12/15/2010
A M+E+A 8/1/2010 11/15/2010
B M+E 1/1/2012 9/1/2012
B M+E+R 3/1/2012 7/1/2012
C M+E+R 1/1/2011 4/1/2011
C M+E 1/1/2011 9/1/2011
C M+E+A 6/1/2011 7/1/2011
D M+R 8/1/2012 10/1/2012
D M+R+E 8/3/2012 10/1/2012
D M+E 8/3/2012 12/15/2012
D M+E+A 10/15/2012 12/1/2012
E M+E 5/1/2011 3/1/2012
E M+E+R 5/15/2011 3/1/2012
E M+R 5/15/2011 5/1/2012
E M+E+R+A 8/1/2011 9/1/2011
; If anyone has any advice, I would really appreciate it. Please let me know if there is anything I can clarify.
... View more