BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
BochengJing
Fluorite | Level 6

Hi SAS Users, I have been having an issues with medication use overlapping. My data looks like this: 

Patient_IDMedicationstart_dtEnd_dt
TomA8/29/20139/28/2013
TomA10/3/201311/2/2013
TomA11/4/201312/4/2013
TomA12/7/20131/6/2014
TomA1/22/20148/26/2014
TomB8/29/20139/28/2013
TomB10/1/201311/25/2013
TomB12/7/20131/6/2014
TomB1/12/20142/11/2014
TomB2/16/20143/18/2014
TomB3/30/20144/29/2014
TomB5/13/20147/7/2014
TomE9/6/201311/2/2013
TomE2/7/20142/14/2014
JerryC8/5/201312/27/2013
JerryC1/2/20145/23/2014
JerryC5/29/20147/27/2014
JerryD3/21/20146/14/2014
JerryD6/16/20148/8/2014
JerryE6/7/20146/19/2014
JoeA3/28/20165/27/2016
JoeA6/2/20167/28/2016
JoeA8/3/20169/2/2016
JoeA9/7/201611/5/2016
JoeA11/7/201612/7/2016
JoeA12/12/20161/11/2017
JoeA1/13/20172/12/2017
JoeA2/15/20174/15/2017
JoeC7/25/20168/24/2016
JoeC8/26/20169/25/2016
JoeC10/10/201612/7/2016
JoeC12/12/20161/11/2017
JoeC1/20/20173/22/2017
JoeD3/28/20165/28/2016
JoeD6/6/201611/5/2016
JoeD11/7/201612/7/2016
JoeD12/12/20164/22/2017
JoeE5/12/20165/15/2016

So each patient were prescribed medication at least three of the five Medication A, B, C, D, E. My goal is to identify patients with patients with at least three overlapping medications. Overlapping medication is defined if one medication's start_dt and end_dt interacts with the other medication's start_dt and end_dt.

 

For example, patient Tom's A prescription fill runs from 08/29/2013 - 9/28/2013; Tom's B prescription fill runs from 08/29/2013-9/28/2013; Tom's E prescription fill runs from 09/06/2013-11/02/2013. Hence, Tom's all three medication overlapped and should be flagged. 

The end product table should looks like this

Patient_IDflag_3_meds
Tom1
Jerry1
Joe1

 

Note, a patient must have at least three medication overlaps, if A overlaps B, B overlaps C, but A doesn't overlap C, then the flag_3_meds=0. 

 

So what is a functional algorithm to generate the flag_3_meds indicator? 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If your data are already group by patient_id, then a two dimensional array (rows for medication, columns for dates) provides a neat way to record each medication history, and then search each date for 3 or more meds:

 

data have;
   input Patient_ID $   Medication $    start_dt :mmddyy10. End_dt :mmddyy10.;
   format start_dt end_dt mmddyy10.;
datalines;
Tom A   8/29/2013   9/28/2013
Tom A   10/3/2013   11/2/2013
Tom A   11/4/2013   12/4/2013
Tom A   12/7/2013   1/6/2014
Tom A   1/22/2014   8/26/2014
Tom B   8/29/2013   9/28/2013
Tom B   10/1/2013   11/25/2013
Tom B   12/7/2013   1/6/2014
Tom B   1/12/2014   2/11/2014
Tom B   2/16/2014   3/18/2014
Tom B   3/30/2014   4/29/2014
Tom B   5/13/2014   7/7/2014
Tom E   9/6/2013    11/2/2013
Tom E   2/7/2014    2/14/2014
Jerry   C   8/5/2013    12/27/2013
Jerry   C   1/2/2014    5/23/2014
Jerry   C   5/29/2014   7/27/2014
Jerry   D   3/21/2014   6/14/2014
Jerry   D   6/16/2014   8/8/2014
Jerry   E   6/7/2014    6/19/2014
Joe A   3/28/2016   5/27/2016
Joe A   6/2/2016    7/28/2016
Joe A   8/3/2016    9/2/2016
Joe A   9/7/2016    11/5/2016
Joe A   11/7/2016   12/7/2016
Joe A   12/12/2016  1/11/2017
Joe A   1/13/2017   2/12/2017
Joe A   2/15/2017   4/15/2017
Joe C   7/25/2016   8/24/2016
Joe C   8/26/2016   9/25/2016
Joe C   10/10/2016  12/7/2016
Joe C   12/12/2016  1/11/2017
Joe C   1/20/2017   3/22/2017
Joe D   3/28/2016   5/28/2016
Joe D   6/6/2016    11/5/2016
Joe D   11/7/2016   12/7/2016
Joe D   12/12/2016  4/22/2017
Joe E   5/12/2016   5/15/2016
Mary    D   12/12/2016  4/22/2017
Mary  A 5/12/2016   5/15/2016
;
%let mindate9=01jan2013;
%let maxdate9=31dec2017;

data want (keep=patient_id flag3_meds);
  set have;
  by patient_id notsorted;
  array drg_dates{1:5,%sysevalf("&mindate9"d):%sysevalf("&maxdate9"d)} _temporary_;
  retain min_date max_date;

  if first.patient_id then call missing(of drg_dates{*},min_date,max_date);

  row=indexc('ABCDE',medication);
  min_date=min(min_date,start_dt);
  max_date=max(max_date,end_dt);
  do col=start_dt to end_dt;
    drg_dates{row,col}=1;
  end;

  if last.patient_id;
  flag3_meds=0;

  do date=min_date to max_date until(flag3_meds=1);
    flag3_meds=sum(0,drg_dates{1,date},drg_dates{2,date},drg_dates{3,date},drg_dates{4,date},drg_dates{5,date}) >=3;
  end;
run;

Make sure to set the macrovars MINDATE9 and MAXDATE9 to cover the entire date range for your study.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
ballardw
Super User

What do you want for patients without any or fewer than 3 overlaps?

Your example data does not show any patient that does not have such and your desired result isn't very clear for that example.

 

First data should be in the form of a data step, as below. IF your date variables are not actually SAS dates you will need to create some additional variables that are for this approach to work. I also added a couple of records for a different patient that did not have an overlap of 3 to show the result one way.

This is a bit of brute force where we create a patient, drug and date data set. Then reshape that with transpose by patient and date. If multiple drugs appear on the same day the transposed data will show 2 or more populated output variables.

Then depending on what you may need, add the flag per record and summarize looking at the maximum value to see if the patient had any day with 3 or more.

Alternatively if you do not need to know about any of the patients with only 2 or fewer then transpose data can be used.

data have;
   input Patient_ID $	Medication $	start_dt :mmddyy10. End_dt :mmddyy10.;
   format start_dt end_dt mmddyy10.;
datalines;
Tom	A	8/29/2013	9/28/2013
Tom	A	10/3/2013	11/2/2013
Tom	A	11/4/2013	12/4/2013
Tom	A	12/7/2013	1/6/2014
Tom	A	1/22/2014	8/26/2014
Tom	B	8/29/2013	9/28/2013
Tom	B	10/1/2013	11/25/2013
Tom	B	12/7/2013	1/6/2014
Tom	B	1/12/2014	2/11/2014
Tom	B	2/16/2014	3/18/2014
Tom	B	3/30/2014	4/29/2014
Tom	B	5/13/2014	7/7/2014
Tom	E	9/6/2013	11/2/2013
Tom	E	2/7/2014	2/14/2014
Jerry	C	8/5/2013	12/27/2013
Jerry	C	1/2/2014	5/23/2014
Jerry	C	5/29/2014	7/27/2014
Jerry	D	3/21/2014	6/14/2014
Jerry	D	6/16/2014	8/8/2014
Jerry	E	6/7/2014	6/19/2014
Joe	A	3/28/2016	5/27/2016
Joe	A	6/2/2016	7/28/2016
Joe	A	8/3/2016	9/2/2016
Joe	A	9/7/2016	11/5/2016
Joe	A	11/7/2016	12/7/2016
Joe	A	12/12/2016	1/11/2017
Joe	A	1/13/2017	2/12/2017
Joe	A	2/15/2017	4/15/2017
Joe	C	7/25/2016	8/24/2016
Joe	C	8/26/2016	9/25/2016
Joe	C	10/10/2016	12/7/2016
Joe	C	12/12/2016	1/11/2017
Joe	C	1/20/2017	3/22/2017
Joe	D	3/28/2016	5/28/2016
Joe	D	6/6/2016	11/5/2016
Joe	D	11/7/2016	12/7/2016
Joe	D	12/12/2016	4/22/2017
Joe	E	5/12/2016	5/15/2016
Mary	D	12/12/2016	4/22/2017
Mary  A	5/12/2016	5/15/2016 
;

/* get drug per date */
data useful;
   set have;
   do date=start_dt to end_dt;
      output;
   end;
   format date mmddyy10.;
run;

proc sort data=useful;
   by patient_id date;
run;
/* get a variable for each drug per day
   caution: if your data is large this 
   may take awhile
*/
Proc transpose data=useful 
     out=trans 
   ;
   by patient_id date;
   var Medication;
run;
/* count by date*/
data daycount;
   set trans;
   Numdrugs= countw(catx(' ',of col:));
   Flag_3_meds= (NumDrugs ge 3);
run;
/*  summarize whether EACH patient ever had
    3 or more
*/
proc summary data=daycount nway;
   class patient_id;
   var flag_3_meds;
   output out=want (drop=_:) max=;
run;


/* if you only want patients with 3 or more
   a slightly simpler problem
*/
Proc sql;
   create table wantalt as 
   select distinct patient_id, 1 as Flag_3
   from trans
   where not missing(col3)
   ;
run;

Note that the Daycount data set can be used to determine

1) how many days were overlapped with 3 or more drugs

2) easily get 2, 4 or 5 overlaps following the similar logic

3) with a little extra work the specific combination (s) of overlap

mkeintz
PROC Star

If your data are already group by patient_id, then a two dimensional array (rows for medication, columns for dates) provides a neat way to record each medication history, and then search each date for 3 or more meds:

 

data have;
   input Patient_ID $   Medication $    start_dt :mmddyy10. End_dt :mmddyy10.;
   format start_dt end_dt mmddyy10.;
datalines;
Tom A   8/29/2013   9/28/2013
Tom A   10/3/2013   11/2/2013
Tom A   11/4/2013   12/4/2013
Tom A   12/7/2013   1/6/2014
Tom A   1/22/2014   8/26/2014
Tom B   8/29/2013   9/28/2013
Tom B   10/1/2013   11/25/2013
Tom B   12/7/2013   1/6/2014
Tom B   1/12/2014   2/11/2014
Tom B   2/16/2014   3/18/2014
Tom B   3/30/2014   4/29/2014
Tom B   5/13/2014   7/7/2014
Tom E   9/6/2013    11/2/2013
Tom E   2/7/2014    2/14/2014
Jerry   C   8/5/2013    12/27/2013
Jerry   C   1/2/2014    5/23/2014
Jerry   C   5/29/2014   7/27/2014
Jerry   D   3/21/2014   6/14/2014
Jerry   D   6/16/2014   8/8/2014
Jerry   E   6/7/2014    6/19/2014
Joe A   3/28/2016   5/27/2016
Joe A   6/2/2016    7/28/2016
Joe A   8/3/2016    9/2/2016
Joe A   9/7/2016    11/5/2016
Joe A   11/7/2016   12/7/2016
Joe A   12/12/2016  1/11/2017
Joe A   1/13/2017   2/12/2017
Joe A   2/15/2017   4/15/2017
Joe C   7/25/2016   8/24/2016
Joe C   8/26/2016   9/25/2016
Joe C   10/10/2016  12/7/2016
Joe C   12/12/2016  1/11/2017
Joe C   1/20/2017   3/22/2017
Joe D   3/28/2016   5/28/2016
Joe D   6/6/2016    11/5/2016
Joe D   11/7/2016   12/7/2016
Joe D   12/12/2016  4/22/2017
Joe E   5/12/2016   5/15/2016
Mary    D   12/12/2016  4/22/2017
Mary  A 5/12/2016   5/15/2016
;
%let mindate9=01jan2013;
%let maxdate9=31dec2017;

data want (keep=patient_id flag3_meds);
  set have;
  by patient_id notsorted;
  array drg_dates{1:5,%sysevalf("&mindate9"d):%sysevalf("&maxdate9"d)} _temporary_;
  retain min_date max_date;

  if first.patient_id then call missing(of drg_dates{*},min_date,max_date);

  row=indexc('ABCDE',medication);
  min_date=min(min_date,start_dt);
  max_date=max(max_date,end_dt);
  do col=start_dt to end_dt;
    drg_dates{row,col}=1;
  end;

  if last.patient_id;
  flag3_meds=0;

  do date=min_date to max_date until(flag3_meds=1);
    flag3_meds=sum(0,drg_dates{1,date},drg_dates{2,date},drg_dates{3,date},drg_dates{4,date},drg_dates{5,date}) >=3;
  end;
run;

Make sure to set the macrovars MINDATE9 and MAXDATE9 to cover the entire date range for your study.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 328 views
  • 0 likes
  • 3 in conversation