BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michellel
Calcite | Level 5

Hi, I have a question about checking days overlap. I have a data set listed as below. For the same id, there might be more than one drug prescription (variable: drug). For each drug prescription, there may be refill (variable: refill, 0: first time, 1: first refill, 2: second refill.....n: nth refill). There are two variables for beginning date and ending date for prescription. There are more than 10K different id.

id     drug     refill     begin_date     end_date

1     drug1     0          03/01/14     03/10/14

1     drug1     1          03/11/14     03/20/14

1     drug1     2          03/25/14     04/05/14

1     drug2     0          03/02/14     03/09/14

1     drug2     1          03/17/14     03/24/14

1     drug2     2          03/27/14     04/04/14

1     drug3     0          09/02/14     09/09/14

2

2

2

3

3

3

.

.

.

.

n

What I need to do is to check if the overlap days for different drug for the same person are more than 5 days. In another word, I defined overlap as two different drugs for more than 5 days together.

What I plan to do is listed as below.

(step 1) to check the begin_date and end_date for the same drug. If the begin_date of refill is just the next day to the end_date of last time, then combine the date into one row. Actually I do not care the refill variable by the end. so I want my original data to be changed like below. For example, for drug1 of id=1, the first two rows will be combined into one row, because 03/11/14 is just next day to 03/10/14. I am not sure how to do this.

id     drug     refill     begin_date     end_date

1     drug1     0          03/01/14     03/20/14

1     drug1     2          03/25/14     04/05/14

1     drug2     0          03/02/14     03/09/14

1     drug2     1          03/17/14     03/24/14

1     drug2     2          03/27/14     04/04/14

1     drug3     0          03/02/14     03/09/14

2

2

2

.

.

.

n

n

(step2) after step 1, I am not sure how to compare the day period with each other for different drugs within the same id. The first difficulty is I do not want to compare the day period for the same drug (like row 1 and row 2 for drug1). The second difficulty is how to do pairwise comparison among different drugs with each other within the same id by iteration. Finally, I need create dummy variable (overlap) indicating if there are two different drugs overlap more than 5 days for each id, like below.

id     overlap

1          1

2          1

3          0

.

.

.

Many thanks in advanced for your answer!

1 ACCEPTED SOLUTION

Accepted Solutions
Jim_G
Pyrite | Level 9

Michelle       Try this code.   It doesn't concentrate on how many drug overlaps per day but which type of overlap occurred.  It there were 5 overlaps of drug type A It shows an overlap of type A.  If there was 5 drug A overlaps on a certain day and 1 B drug on the same day, it would report a type A overlap and a type A&B overlap.  Jim

** expand each date range to one record per day;

data one;

input id  type $ drug $  (begin_date   end_date) (: mmddyy12.);

format begin_date  end_date ondate mmddyy10.;

*   put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

  output;  ondate+1;

end;

output;  drop begin_date  end_date;

cards;

1 a drug1  03/01/14 03/10/14

1 a drug1  03/11/14 03/20/14

1 a drug1  03/25/14 04/05/14

1 a drug2  03/02/14 03/09/14

1 a drug2  03/17/14 03/24/14

1 a drug2  03/27/14 04/04/14

1 a drug3  09/02/14 09/09/14

1 b drug4  03/01/14 03/10/14

1 b drug4  03/11/14 03/20/14

1 b drug4  03/25/14 04/05/14

1 b drug5  09/02/14 09/09/14

2 a drug1  03/01/14 03/10/14

2 a drug1  03/11/14 03/20/14

2 a drug1  03/25/14 04/05/14

2 a drug2  03/02/14 03/09/14

2 a drug2  03/17/14 03/24/14

2 a drug2  03/27/14 04/04/14

2 a drug3  09/02/14 09/09/14

;

**  sort by day to see how many drugs taken each day ;

proc sort;  by id ondate type drug ;

proc print; by id ondate; id id ondate ;

title 'how many drugs taken each day';  run;

* count number of drugs taken each day by id type;

*  collapse to one record per string of days;

data;  set;  by id ondate; * type drug ; retain typa typb;

if first.ondate then do; count=0; typa=0; typb=0; end;

if type='a' then typa+1;

if type='b' then typb+1;

count+1;

if last.ondate and count gt 1 then do;

   if typa and     typb then do;  olap='A&B'; output; end;

   if typa gt 1         then do;  olap='A';   output; end;

   if typb gt 1         then do;  olap='B';   output; end;

end;  

drop drug type typa typb;

proc sort;    by id olap ondate;

proc print;   id id olap  ondate; 

title 'Collapse by date and type where multiple drugs taken'; run;

  ** step 3 identify the string of contiguous days (episode);

data;   set;   by id olap;

dif=ondate-lag(ondate);

retain flag  strtdate; 

format strtdate mmddyy10.;

               

if first.olap then do;

   contdys=0; flag=0; dif=1; strtdate=ondate;

end;

if dif=1 then contdys+1;

else do;   ** end of continuous string;

    dif=1; strtdate=ondate;

    contdys=1;

end;   

drop   dif flag count;

proc print; id id olap strtdate ondate contdys olap ;

    by id olap strtdate;

title 'See episode data  overlap of drugs and type'; run;

***step 4  see if the continuous string of days was gt 5;

data;   set;   by id olap strtdate;

     drop ondate ;

  if last.strtdate then do;    

        if contdys gt 5 then output;

  end;   

proc print split='*';

  label id='id*'  strtdate='episode*start date'

    contdys='continous* days' olap='type'; 

     id id strtdate contdys olap;  by id; run;  

*/;

View solution in original post

18 REPLIES 18
AskoLötjönen
Quartz | Level 8

You need to do several steps. I added proc sort to remove original observation when "combined". There may be smarter solutions. In the end results overlapping cases are twice (both ways) but you can modify it smarter.

proc sql;

  create table want as

  select h1.id,

         h1.drug,

         h1.refill,

         h1.begin_date,

         coalesce(h2.end_date,h1.end_date) format=date9. as end_date

  from   have1 h1

         left join have1 h2

               on  h1.id = h2.id

              and  h1.drug = h2.drug

              and  h1.refill lt h2.refill

              and  h1.end_date + 1 eq h2.begin_date

;

quit;

proc sort data=want out=want1 nodupkey;

   by id drug end_date;

run;

proc sql;

  create table overlaps as

  select w1.id,

         w1.drug as Drug1,

         w1.refill as refill1,

         w1.begin_date format=date9. as begin_date1,

         w1.end_date   format=date9. as end_date1,

         w2.drug                     as drug2,

         w2.refill as refill2,

         w2.begin_date format=date9. as begin_date2,

         w2.end_date   format=date9. as end_date2

  from   want1 w1

         left join want1 w2

               on  w1.id = w2.id

               and w1.drug  ^= w2.drug

               and (w1.begin_date between w2.begin_date and w2.end_date

                 or w1.end_date between w2.begin_date and w2.end_date

                 or w2.begin_date between w1.begin_date and w1.end_date

                 or w2.end_date between w1.begin_date and w1.end_date

                   )

;

quit;

Jim_G
Pyrite | Level 9

I did this before by this method.

Process each record in your input data set.

output (Write) an observation for each day the drug was taken. Start at the start date and output a record for each day until the end date.

Now you have a dataset with Id  drug and date.  sort by id date drug.   Now read the records.in by id date drug.  If first.drug count=0  If last.drug and count gt 5 then output.

   Jim

Ksharp
Super User

I did the same exact thing as Jim did. So search your questions at this forum , that would not waste both of us time.

Code: Program

data have;
input id   drug $   refill   (begin_date   end_date) (: mmddyy12.);
format begin_date   end_date mmddyy10.;
cards;
1 drug1 0 03/01/14 03/10/14
1 drug1 1 03/11/14 03/20/14
1 drug1 2 03/25/14 04/05/14
1 drug2 0 03/02/14 03/09/14
1 drug2 1 03/17/14 03/24/14
1 drug2 2 03/27/14 04/04/14
1 drug3 0 09/02/14 09/09/14
;
run;

data temp;
set have;
n+1;
date=begin_date;output;
date=end_date;output;
format date mmddyy10.;
drop begin_date   end_date;
run;
data x;
set temp;
by n;
if first.n and dif(date) ne 1 then group+1;
drop n;
run;
data want;
set x;
by group;
retain ref begin_date;
if first.group then do;ref=refill;begin_date=date;end;
if last.group then do; end_date=date; output;end;
drop refill group date;
format begin_date   end_date mmddyy10.;
run;
michellel
Calcite | Level 5

Thanks Xia Ke for your answer! It works for one person. How to get it done for different people? There are more than 10K people in my dataset. Sorry I should make it clear for a lot of different id.

Thanks so much for your time!

Ksharp
Super User

My code also can work on different Person, if your table has been sorted just like your sample data.

Jim_G
Pyrite | Level 9

This code should find where there is an overlap in at least 2 drugs for 5 days per id. It also indicates the total days overlapping and high water number of drugs overlapped;

data one;

input id   drug $   refill   (begin_date   end_date) (: mmddyy12.);

format begin_date   end_date ondate mmddyy10.;

*   put out a record for each day on the drug;

ondate=begin_date; 

do until (ondate=end_date);

  output;  ondate+1;

end;   

output;  drop begin_date  end_date refill;

cards;

1 drug1 0 03/01/14 03/10/14

1 drug1 1 03/11/14 03/20/14

1 drug1 2 03/25/14 04/05/14

1 drug2 0 03/02/14 03/09/14

1 drug2 1 03/17/14 03/24/14

1 drug2 2 03/27/14 04/04/14

1 drug3 0 09/02/14 09/09/14

2 drug1 0 03/01/14 03/10/14

2 drug1 1 03/11/14 03/20/14

2 drug1 2 03/25/14 04/05/14

2 drug2 0 03/02/14 03/09/14

2 drug2 1 03/17/14 03/24/14

2 drug2 2 03/27/14 04/04/14

2 drug3 0 09/02/14 09/09/14

;

proc sort;  by id ondate drug;

proc print; id id ondate drug; run;

* count number of drugs taken each day by id;

data;  set;  by id ondate drug; drop drug;

if first.ondate then count=1;

if last.ondate and not first.ondate then count+1;

if last.ondate and count gt 1 then output;

proc print; run; title 'count of days where multiple drugs taken';

* ids with 5 days overlapped;

data;   set;   by id;

if first.id then dysovrlap=0;

dysovrlap+1;

if last.id and dysovrlap gt 5 then output;

proc print;  title 'days overlap and number of drugs';run;

michellel
Calcite | Level 5

Thanks Jim_G! You are super!

Just one question. At the last step of your code, variable [dysovrlap] shows total days of overlap. How to check if there are 5 continuous days for overlap?

I also have a further question. Sorry I forgot to mention there is one more variable indicating the drug type. There are two drug types, A and B, for all these different drugs. I need to know if it is type_A and type_A overlap, or type_A and type_B overlap, or type_B and type_B overlap.

Thanks so much!

PhilC
Rhodochrosite | Level 12

This turned out to be something that was more crazy than I first thought...  I don't see STEP1 being solved with out a doubly-nested loops and multiple passes through the algorithm until all time periods are concatenated.

data STEP1 (Keep=id   drug  refill begin_date   end_date over_Lap);
  array b_date [&MAX_ARRAY_SIZE] b_date1-b_date&MAX_ARRAY_SIZE;
  array e_date [&MAX_ARRAY_SIZE] e_date1-e_date&MAX_ARRAY_SIZE;
  array ovrLap [&MAX_ARRAY_SIZE] ovrLap1-ovrLap&MAX_ARRAY_SIZE;
  do until (eof);
    do until (last.id);
      /*Read Input*/
      do N=1 by 1 until (last.drug);
        set have end=eof ;by id drug;
        b_date[N]=begin_date;
        e_date[N]=end_date;
        ovrLap[N]=0;
      end;
      NUM_ITEMS =N;
      /*STEP-1 Concatenate Allowed Periods*/
      NOT_FINISHED=1;
      do while (NOT_FINISHED);NOT_FINISHED=0;
        do M=1 to NUM_ITEMS ;
          if not missing(b_date) then
          do P=M+1 to NUM_ITEMS ;
            if not missing(b_date

) then
            do;
              if    -1 <= e_date-b_date

<=5
                 OR -1 <= e_date

-b_date <=5 then
              do;
                b_date=min(b_date, b_date

);
                e_date=max(e_date, e_date

);
                b_date

=.;e_date

=.;
                NOT_FINISHED=1;
                M=NUM_ITEMS;P=NUM_ITEMS;
              end;
            end;
          end;
        end;
      end;
      /*STEP-2 (prep) Find and flag Overlaps*/
      do M=1 to NUM_ITEMS ;
        if not missing(b_date) then
        do P=M+1 to NUM_ITEMS ;
          if not missing(b_date

) then
          do;
            if    e_date-b_date

>5
               AND e_date

-b_date >5 then
            do;
              ovrLap=1;
              ovrLap

=1;
            end;
          end;
        end;
      end;
      /*Output*/
      do N=1 to NUM_ITEMS ;
        begin_date=b_date[N];
        end_date  =e_date[N];
        over_Lap  =ovrLap[N];
        refill=N;
        if not missing(b_date[N])
          then output;
      end;
    end;/*do until (last.id);*/
  end;/*do until (eof);*/
  stop;
run;

data STEP2 (Keep=id overlap);
  id=0;
  overlap=0;
  do until (last.id);
    set STEP1; by ID;
    if over_Lap then overlap=1;
  end;
  output;
run;

Jim_G
Pyrite | Level 9

Michell if this is helpful please mark helpful or correct.

continuous can be calculated by comparing the date in this record to the last record. if there is only 1 day different then it is continuous;

I added the type variable.

data one;

input id  type $ drug $  (begin_date   end_date) (: mmddyy12.);

format begin_date   end_date ondate mmddyy10.;

*   put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

  output;  ondate+1;

end;

output;  drop begin_date  end_date;

cards;

1 a drug1  03/01/14 03/10/14

1 a drug1  03/11/14 03/20/14

1 a drug1  03/25/14 04/05/14

1 a drug2  03/02/14 03/09/14

1 a drug2  03/17/14 03/24/14

1 a drug2  03/27/14 04/04/14

1 a drug3  09/02/14 09/09/14

1 b drug1  03/01/14 03/10/14

1 b drug1  03/11/14 03/20/14

1 b drug1  03/25/14 04/05/14

1 b drug2  03/02/14 03/09/14

1 b drug2  03/17/14 03/24/14

1 b drug2  03/27/14 04/04/14

1 b drug3  09/02/14 09/09/14

2 a drug1  03/01/14 03/10/14

2 a drug1  03/11/14 03/20/14

2 a drug1  03/25/14 04/05/14

2 a drug2  03/02/14 03/09/14

2 a drug2  03/17/14 03/24/14

2 a drug2  03/27/14 04/04/14

2 a drug3  09/02/14 09/09/14

;

proc sort;  by id type ondate drug ;

*proc print; *id id type ondate drug ; run;

* count number of drugs taken each day by id type;

data;  set;  by id type ondate drug ; *drop drug;

if first.ondate then count=1;

if last.ondate and not first.ondate then count+1;

if last.ondate and count gt 1 then output;

proc print; run; title 'count of days where multiple drugs taken';

* ids with 5 continuous days overlapped;

data;   set;   by id type;

dif=ondate-lag(ondate); retain flag;

if first.type then do; contdys=0; flag=0; dif=1; end;

if dif=1 then contdys+1; else contdys=1;

if contdys gt 5 then flag=1;

if last.type and flag=1 then output;

drop drug ondate dif flag contdys;

proc print; id id type ;

title 'Overlap by type and number of drugs  by id and type'; run;

michellel
Calcite | Level 5

Thanks Jim_G for your answer!

There is still one thing needed to figure out. Based on the result from your code, it can be shown overlap within each drug type, but how about compare one drug from type a and another drug from type b within one patient? I changed input information as below. The result is supposed to show there is overlap for type a for patient 1, overlap between type a and b for patient 1, and overlap for drug a for patient 2, but the result did not show the overlap between type a and type b for patient 1. Could you help me figure out this? Thanks so.......much!

data one;

input id  type $ drug $  (begin_date   end_date) (: mmddyy12.);

format begin_date   end_date ondate mmddyy10.;

*   put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

  output;  ondate+1;

end;

output;  drop begin_date  end_date;

cards;

1 a drug1  03/01/14 03/10/14

1 a drug1  03/11/14 03/20/14

1 a drug1  03/25/14 04/05/14

1 a drug2  03/02/14 03/09/14

1 a drug2  03/17/14 03/24/14

1 a drug2  03/27/14 04/04/14

1 a drug3  09/02/14 09/09/14

1 b drug4  03/01/14 03/10/14

1 b drug4  03/11/14 03/20/14

1 b drug4  03/25/14 04/05/14

1 b drug5  09/02/14 09/09/14

2 a drug1  03/01/14 03/10/14

2 a drug1  03/11/14 03/20/14

2 a drug1  03/25/14 04/05/14

2 a drug2  03/02/14 03/09/14

2 a drug2  03/17/14 03/24/14

2 a drug2  03/27/14 04/04/14

2 a drug3  09/02/14 09/09/14

;

proc sort;  by id type ondate drug ;

*proc print; *id id type ondate drug ; run;

* count number of drugs taken each day by id type;

data;  set;  by id type ondate drug ; *drop drug;

if first.ondate then count=1;

if last.ondate and not first.ondate then count+1;

if last.ondate and count gt 1 then output;

proc print; run; title 'count of days where multiple drugs taken';

* ids with 5 continuous days overlapped;

data;   set;   by id type;

dif=ondate-lag(ondate); retain flag;

if first.type then do; contdys=0; flag=0; dif=1; end;

if dif=1 then contdys+1; else contdys=1;

if contdys gt 5 then flag=1;

if last.type and flag=1 then output;

drop drug ondate dif flag contdys;

proc print; id id type ;

title 'Overlap by type and number of drugs  by id and type'; run;

Jim_G
Pyrite | Level 9

If we have a case where one id has a string of 12 continuous days. For 6 days he overlapped ‘A’ drugs and for 6 days he overlapped ‘B’ drugs.  Is it more important to report 12 days overlapped A and B drugs  or to report 6 days overlap drug a and 6 days overlapped B ?    Do you want to report  6 days overlaps within type or  type within overlaps ? 

  Do you want report

         Id,  type, number contiuous days (gt 5)

          1    A                     6

          1    B                     6

Or     Id,   continuous days,  types overlapped

           1         12                  A,B

Jim

michellel
Calcite | Level 5

Thanks Jim!!

I want to report type (A,A or A,B, or B,B) within overlaps.

     Id,   continuous days,  types overlapped

      1         12                  A,B

My job is to check if (1) there are two different drugs overlap >5 days within type a, (2) if it happens between type a and type b, and/or (3) if it happens within type b.

Specifically, if the data is the following (please use the following data). I need get the result showing there are (1) drugs overlap for type a and (2) drugs overlap between type a and b for id=1, and drugs overlap for type a for id=2, like the following.

result:

     Id,   continuous days,  types overlapped

      1         XXX                  a,a

      1         XXX                  a,b

      2         XXX                  a,a

data:

id type drug begin_date end_date

1 a drug1  03/01/14 03/10/14

1 a drug1  03/11/14 03/20/14

1 a drug1  03/25/14 04/05/14

1 a drug2  03/02/14 03/09/14

1 a drug2  03/17/14 03/24/14

1 a drug2  03/27/14 04/04/14

1 a drug3  09/02/14 09/09/14

1 b drug4  03/01/14 03/10/14

1 b drug4  03/11/14 03/20/14

1 b drug4  03/25/14 04/05/14

1 b drug5  09/02/14 09/09/14

2 a drug1  03/01/14 03/10/14

2 a drug1  03/11/14 03/20/14

2 a drug1  03/25/14 04/05/14

2 a drug2  03/02/14 03/09/14

2 a drug2  03/17/14 03/24/14

2 a drug2  03/27/14 04/04/14

2 a drug3  09/02/14 09/09/14

Thanks so much in advance for your time and your answer!!!

Jim_G
Pyrite | Level 9

Michelle       try this code.   The output looks right.  

All the proc prints are for debugging       Jim

** step 1 expand each date range to one record per day;

data one;

input id  type $ drug $  (begin_date   end_date) (: mmddyy12.);

format begin_date  end_date ondate mmddyy10.;

*   put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

  output;  ondate+1;

end;

output;  drop begin_date  end_date;

cards;

1 a drug1  03/01/14 03/10/14

1 a drug1  03/11/14 03/20/14

1 a drug1  03/25/14 04/05/14

1 a drug2  03/02/14 03/09/14

1 a drug2  03/17/14 03/24/14

1 a drug2  03/27/14 04/04/14

1 a drug3  09/02/14 09/09/14

1 b drug4  03/01/14 03/10/14

1 b drug4  03/11/14 03/20/14

1 b drug4  03/25/14 04/05/14

1 b drug5  09/02/14 09/09/14

2 a drug1  03/01/14 03/10/14

2 a drug1  03/11/14 03/20/14

2 a drug1  03/25/14 04/05/14

2 a drug2  03/02/14 03/09/14

2 a drug2  03/17/14 03/24/14

2 a drug2  03/27/14 04/04/14

2 a drug3  09/02/14 09/09/14

;

**  sort by day to see how many drugs taken each day ;

proc sort;  by id ondate type drug ;

proc print; by id ondate; id id ondate ;

title 'how many drugs taken each day';  run;

* step 2  count number of drugs taken each day by id type;

*  collapse to one record per day;

data;  set;  by id ondate; * type drug ; retain typa typb;

if first.ondate then do; count=0; typa=0; typb=0; end;

if type='a' then typa=1;

if type='b' then typb=1;

count+1;

if last.ondate and count gt 1 then do;

   if typa and     typb then olap='A&B';

   if typa and not typb then olap='A';

   if typb and not typa then olap='B';

   output;

end;  

drop drug type typa typb;

proc print;   id id ondate olap;

title 'Collapse by date and type where multiple drugs taken'; run;

** step 3 identify the string of contiguous days (episode);

data;   set;   by id ;

dif=ondate-lag(ondate);

retain flag  strtdate typa typb typab;

format strtdate mmddyy10.;

               

if first.id then do;

   contdys=0; flag=0; dif=1; strtdate=ondate;

   typa=0; typb=0;  typab=0;

end;

**** store type ----------------------;

if olap='A'   then typa=1;

if olap='B'   then typb=1;

if olap='A&B' then typab=1;

if dif=1 then contdys+1;

else do;   ** end of continuous string;

    dif=1; strtdate=ondate;

    contdys=1; typa=0; typb=0;  typab=0;

end;   

drop   dif flag count;

proc print; id id strtdate ondate contdys olap ; by id strtdate;

title 'See episode data  overlap of drugs and type'; run;

*****  step 4  see if the continuous string of days was gt 5;

data;   set;   by id strtdate; drop ondate olap typa typb typab;

if last.strtdate then do;   retain overlap  '        '; 

    if typa then substr(overlap,1,1)='A';

    if typb then substr(overlap,3,1)='B';

    if typab then substr(overlap,5,3)='A&B';

    if contdys gt 5 then output;

    overlap=' ';

end;   

proc print split='*';

  label id='id*'  strtdate='episode*start date'

    contdys='continous* days' overlap='type'; 

     id id strtdate contdys ;  by id; run;  

*/; 

michellel
Calcite | Level 5

Thanks Jim for your update! I tried your code and the result listed as below. I am wondering if the result can also show overlap 'A' for id=1 besides the case of A&B overlap, as there are also drug 1 and drug 2 overlap within type a of id 1. Thanks so much for your answer!

idstrtdatecontdysoverlap
13/1/201420A&B
13/25/201412A&B
19/2/20148A&B
23/2/20148A
23/27/20149A

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 11664 views
  • 11 likes
  • 5 in conversation