Hi,
This is what my data set looks like (with fictional values and data) -
Member ID Drug Days Supply Fill Date End date
1 APAP- Codeine 30 6/1/15 7/1/15
1 Amoxicillin 10 6/8/15 6/18/15
1 APAP-Codeine 5 8/8/15 8/13/15
2 Cephalexin 20 2/5/15 2/25/15
2 Ropinarole 10 4/2/15 4/12/15
3 Ibuprofen 5 7/2/15 7/7/15
.
.
.
n
What I essentially need to do is check if 2 medications were overlapping. If they were not overlapping, I need to sum the days supply for the necessary drugs. If it is not overlapping, leave the days supply as is.
The data is already sorted by member ID, fill date and end date.
This is what I need my output to look like -
Member ID Drug Days Supply Fill Date End Date Sum days supply
1 APAP- Codeine 30 6/1/15 7/1/15 30
1 Amoxicillin 10 6/8/15 6/18/15 10
1 APAP-Codeine 5 8/8/15 8/13/15 35
2 Cephalexin 20 2/5/15 2/25/15 20
2 Ropinorole 30 2/10/15 3/10/15 30
3 Ibuprofen 5 7/2/15 7/7/15 5
Thank you!
[1] Your sample data set shows some of the End_Dates are lesser than
Fill_Dates. From Days_Supply, I have corrected and is given in
HAVE.
[2] There are three possibilites to check namely:
a. Non-Overlapping Intervals, b. Partially Overlapping and
c. Intervals that falls fully inside the previous Interval.
Your input data set is missing the third possibility. I have
added one such interval in my second data set.
[3] The solution can be achieved without using arrays. We just
hold the necessary variables(fill_date, end_date, sum_days_supply)
with Prev_ prefix to compare with the current record.
data have;
informat  Fill_Date mmddyy10. End_Date mmddyy10.;
input MemberID @8 Drug :$1. @13 Days_Supply @18 Fill_Date @31 End_Date;
datalines;
1      a     4   01/02/2015   01/06/2015 
1      b     4   01/07/2015   01/11/2015 
1      b    15   01/15/2015   01/30/2015 
1      c    15   02/10/2015   02/25/2015 
2      a     4   12/02/2015   12/06/2015 
2      b     4   12/04/2015   12/08/2015 
2      b     5   12/09/2015   12/14/2015 
;
run;
The revised data set holding interval fully falling inside the previous interval.
data have;
informat  Fill_Date mmddyy10. End_Date mmddyy10.;
input MemberID @8 Drug :$1. @13 Days_Supply @18 Fill_Date @31 End_Date;
datalines;
1      a     4   01/02/2015   01/06/2015
1      b     4   01/07/2015   01/11/2015
1      b    15   01/15/2015   01/30/2015
1      b     4   01/16/2015   01/20/2015
1      c    15   02/10/2015   02/25/2015
2      a     4   12/02/2015   12/06/2015
2      b     4   12/04/2015   12/08/2015
2      b     5   12/09/2015   12/14/2015
;
run;
proc sort data = have;
by MemberID FillDate;
run;
data need;
   do until(last.MemberID);
      set have;
      by MemberID;
      if first.MemberID then do;
         output;
         Prev_fill_date = fill_date;
         Prev_end_date  = end_date;
         Prev_sum_days_supply = days_supply;
         sum_days_supply = .;
      end;
      else do;
         if Prev_end_date < fill_date then do;
            sum_days_supply = sum(Prev_sum_days_supply,days_supply);
            Prev_sum_days_supply = sum_days_supply;
            Prev_fill_date = fill_date;
            Prev_end_date  = end_date;
         end;
         else if Prev_end_date > fill_date & Prev_end_date < end_date  then do;
            sum_days_supply = sum(Prev_sum_days_supply + (end_date - Prev_end_date));
            Prev_sum_days_supply = sum_days_supply;
            Prev_fill_date = fill_date;
            Prev_end_date  = end_date;
         end;
         else if Prev_fill_Date <= fill_date and Prev_end_date >= end_date 
then sum_days_supply = days_supply;
         output;
      end;
   end;
drop Prev:;
run;
proc print data = need;
run;
What about combinations such as:
Member ID Drug Days Supply Fill Date End date
1 APAP- Codeine 30 6/1/15 7/1/15
1 Amoxicillin 10 6/28/15 7/8/15
1 APAP-Codeine 5 7/715 7/13/15
A overlaps with B, B overlaps with C, but A doesn't overlap with C.
That is a good question -
For overlaps that fall within a certain time for example - for member 1, drugs 1 and 2 - the 10 day days supply of drug 2 falls within the larger 30 day period of days supply for drug 1. In that case, I will sum days' supply only for 1 and 3.
I now see a problem arising if the days supply for drug 2 was 30 and the observations looked like this -
Member ID Drug Days Supply Fill Date End Date Sum days supply
1 APAP- Codeine 30 6/1/15 7/1/15 30
1 Amoxicillin 30 6/8/15 718/15 48
1 APAP-Codeine 5 8/8/15 8/13/15 53
What I essentially need to do is sum days' supply for observations that have no overlap - in the above example I will sum days' supply for observations 1 and 3 18 days for 2. Does this make sense?
I presume that your interest is on finding Overlapping date-intervals for a DRUG for a given MemberID. If so, consider the following solution. I assumed that in real situation, you may not have more than 100 intervals within a DRUG for a MemberID. If it is so, SIZE the arrays suitably. The input data set is sorted by DRUG, MemberID and FillDate.
data have;
informat  FillDate mmddyy10. EndDate mmddyy10.;
input MemberID @4 Drug :$12. @20 DaysSupply @27 FillDate @37 EndDate;
datalines;
1  APAP-Codeine    30     6/1/15    7/1/15 
1  Amoxicillin     10     6/8/15    6/18/15
1  APAP-Codeine     5     8/8/15    8/13/15
2  Cephalexin      20     2/5/15    2/25/15
2  Ropinorole      30     2/10/15   3/10/15
3  Ibuprofen        5     7/2/15    7/7/15 
;
run;
proc sort data = have;
by Drug MemberID FillDate;
run;
data need;
   array F[100] _temporary_;
   array E[100] _temporary_;
   array Days[100] _temporary_;
   do count = 1 by 1 until(last.Drug);
      set have;
      by Drug;
      F[count] = FillDate;
      E[count] = EndDate;
      Days[count] = DaysSupply;
   end;
   FillDate = F[1];
   EndDate = E[1];
   Sumdayssupply = Days[1];
   output;
   do i = 2 to count;
      flag = F[i-1] <= F[i] <= E[i-1];
      if flag = 0 then Days[i] + Days[i-1];
      FillDate = F[i];
      EndDate = E[i];
      Sumdayssupply = Days[i];
      output;
   end;
drop count i flag;
run;
proc print data = need;
run;
Hi datasp,
My interest is to sum all non-overlapping dates for all the drugs for a given Member ID.
Does this make more sense? 
Thank you!
Hi datasp,
I tried tweaking your code based on what I needed -
data want;
array F[200] _temporary_;
array E[200] _temporary_;
array days[200] _temporary_;
do count = 1 by 1 until (last.mbr);
set want;
by mbr;
F[count] = fill_date;
E[count] = end_date;
days[count] = days_supply;
end;
fill_date = F[1];
end_date = E[1];
days_supply = days[1];
output;
do i=2 to count;
flag = F[i-1] <= F[i] <= E[i-1];
if flag = 0 then sum_days_supply = days[i] + days[i-1];
fill_date = F[i];
end_date = E[i];
days_supply = days[i];
output;
end;
drop count i flag;
run;
This works for the most part except -
1. How do I account for non-overlapping days from 2 dates that have some overlap?
For example, when we sort by memberID, fill_date end_date and run the above code this is the output I get -
Member ID Drug days_supply fill_date end_date sum_days_supply
1 a 4 09/24/2015 12/06/2015 .
1 b 4 12/02/2015 12/06/2015 8
1 b 5 12/04/2015 12/09/2011 8
As you can see, in the 3rd line it does not sum the days supply because of the overlap. How should I go about this so that the difference in the overlapping days (12/07-12/09 i.e. 3 days) can also get accounted for?
2. Additionally, for patients that have multiple non-overlapping fills, it is only summing the days supply of the previous observation. How can I code this in a way that it sums the days supply for all the non-overlapping fills?
For example this is the output I want -
Member ID Drug days_supply fill_date end_date sum_days_supply
1 a 4 01/24/2015 01/06/2015 .
1 b 4 01/07/2015 01/11/2015 8
1 b 15 01/15/2015 01/30/2011 23
1 c 15 02/10/2015 02/25/2011 38
Thank you!
I expected the data set must be sorted by Drug, MemberID and fillDate in that order, so that summing for non-ovelapping intervals can be obtained by the current row and the immediate previous row. Thie assumes that endDate is greater than fillDate. But you seem to have not sorted in that order. In the following example(first row), endDate is EARLIER to fillDate. Is it right?
Prepare an example data set that have all your requirements. Show the desired output that comes out of the example data set. Give rules on how each output record is derived.
Member ID Drug days_supply fill_date end_date sum_days_supply
1 a 4 01/24/2015 01/06/2015 .
1 b 4 01/07/2015 01/11/2015 8
1 b 15 01/15/2015 01/30/2011 23
1 c 15 02/10/2015 02/25/2011 38
Sorry, that might have been a mistake. I need to the sum of non-overlapping days' supply for each patient and hence need to sort by member ID.
Member ID Drug days_supply fill_date end_date
1 a 4 01/02/2015 01/06/2015
1 a 4 01/07/2015 01/11/2015
1 b 15 01/15/2015 01/30/2011
1 c 15 02/10/2015 02/25/2011
2 a 4 12/02/2015 12/06/2015
2 b 4 12/04/2015 12/08/2015
2 b 5 12/09/2015 12/14/2015
Member ID Drug days_supply fill_date end_date sum_days_supply
1 a 4 01/02/2015 01/06/2015 .
1 a 4 01/07/2015 01/11/2015 8
1 b 15 01/15/2015 01/30/2011 23
1 c 15 02/10/2015 02/25/2011 38
2 a 4 12/02/2015 12/06/2015 .
2 b 4 12/04/2015 12/08/2015 6 (accounting for 2 non-overlap days)
2 b 5 12/09/2015 12/14/2015 11
[1] Your sample data set shows some of the End_Dates are lesser than
Fill_Dates. From Days_Supply, I have corrected and is given in
HAVE.
[2] There are three possibilites to check namely:
a. Non-Overlapping Intervals, b. Partially Overlapping and
c. Intervals that falls fully inside the previous Interval.
Your input data set is missing the third possibility. I have
added one such interval in my second data set.
[3] The solution can be achieved without using arrays. We just
hold the necessary variables(fill_date, end_date, sum_days_supply)
with Prev_ prefix to compare with the current record.
data have;
informat  Fill_Date mmddyy10. End_Date mmddyy10.;
input MemberID @8 Drug :$1. @13 Days_Supply @18 Fill_Date @31 End_Date;
datalines;
1      a     4   01/02/2015   01/06/2015 
1      b     4   01/07/2015   01/11/2015 
1      b    15   01/15/2015   01/30/2015 
1      c    15   02/10/2015   02/25/2015 
2      a     4   12/02/2015   12/06/2015 
2      b     4   12/04/2015   12/08/2015 
2      b     5   12/09/2015   12/14/2015 
;
run;
The revised data set holding interval fully falling inside the previous interval.
data have;
informat  Fill_Date mmddyy10. End_Date mmddyy10.;
input MemberID @8 Drug :$1. @13 Days_Supply @18 Fill_Date @31 End_Date;
datalines;
1      a     4   01/02/2015   01/06/2015
1      b     4   01/07/2015   01/11/2015
1      b    15   01/15/2015   01/30/2015
1      b     4   01/16/2015   01/20/2015
1      c    15   02/10/2015   02/25/2015
2      a     4   12/02/2015   12/06/2015
2      b     4   12/04/2015   12/08/2015
2      b     5   12/09/2015   12/14/2015
;
run;
proc sort data = have;
by MemberID FillDate;
run;
data need;
   do until(last.MemberID);
      set have;
      by MemberID;
      if first.MemberID then do;
         output;
         Prev_fill_date = fill_date;
         Prev_end_date  = end_date;
         Prev_sum_days_supply = days_supply;
         sum_days_supply = .;
      end;
      else do;
         if Prev_end_date < fill_date then do;
            sum_days_supply = sum(Prev_sum_days_supply,days_supply);
            Prev_sum_days_supply = sum_days_supply;
            Prev_fill_date = fill_date;
            Prev_end_date  = end_date;
         end;
         else if Prev_end_date > fill_date & Prev_end_date < end_date  then do;
            sum_days_supply = sum(Prev_sum_days_supply + (end_date - Prev_end_date));
            Prev_sum_days_supply = sum_days_supply;
            Prev_fill_date = fill_date;
            Prev_end_date  = end_date;
         end;
         else if Prev_fill_Date <= fill_date and Prev_end_date >= end_date 
then sum_days_supply = days_supply;
         output;
      end;
   end;
drop Prev:;
run;
proc print data = need;
run;
This works, thank you so much!
Just one more question - not sure if this is possible to incorporate it in the code (and if it is not, I can just make a separate dataset for this) - but for patients who have had only 1 prescription, it marks the sum_days_supply as missing since that is what we have assigned it to do.
Would it be easier if I just ouput all these patients to a separate dataset and then set the 2 data sets later?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
