Solved: Re: how to check days overlap

michellel · Posted 07-14-2015 09:55 PM

Hi, I have a question about checking days overlap. I have a data set listed as below. For the same id, there might be more than one drug prescription (variable: drug). For each drug prescription, there may be refill (variable: refill, 0: first time, 1: first refill, 2: second refill.....n: nth refill). There are two variables for beginning date and ending date for prescription. There are more than 10K different id.

id drug refill begin_date end_date

1 drug1 0 03/01/14 03/10/14

1 drug1 1 03/11/14 03/20/14

1 drug1 2 03/25/14 04/05/14

1 drug2 0 03/02/14 03/09/14

1 drug2 1 03/17/14 03/24/14

1 drug2 2 03/27/14 04/04/14

1 drug3 0 09/02/14 09/09/14

2

3

.

n

What I need to do is to check if the overlap days for different drug for the same person are more than 5 days. In another word, I defined overlap as two different drugs for more than 5 days together.

What I plan to do is listed as below.

(step 1) to check the begin_date and end_date for the same drug. If the begin_date of refill is just the next day to the end_date of last time, then combine the date into one row. Actually I do not care the refill variable by the end. so I want my original data to be changed like below. For example, for drug1 of id=1, the first two rows will be combined into one row, because 03/11/14 is just next day to 03/10/14. I am not sure how to do this.

id drug refill begin_date end_date

1 drug1 0 03/01/14 03/20/14

1 drug1 2 03/25/14 04/05/14

1 drug2 0 03/02/14 03/09/14

1 drug2 1 03/17/14 03/24/14

1 drug2 2 03/27/14 04/04/14

1 drug3 0 03/02/14 03/09/14

2

.

n

(step2) after step 1, I am not sure how to compare the day period with each other for different drugs within the same id. The first difficulty is I do not want to compare the day period for the same drug (like row 1 and row 2 for drug1). The second difficulty is how to do pairwise comparison among different drugs with each other within the same id by iteration. Finally, I need create dummy variable (overlap) indicating if there are two different drugs overlap more than 5 days for each id, like below.

id overlap

1 1

2 1

3 0

.

Many thanks in advanced for your answer!

Jim_G · Posted 07-21-2015 04:46 PM

Michelle Try this code. It doesn't concentrate on how many drug overlaps per day but which type of overlap occurred. It there were 5 overlaps of drug type A It shows an overlap of type A. If there was 5 drug A overlaps on a certain day and 1 B drug on the same day, it would report a type A overlap and a type A&B overlap. Jim

** expand each date range to one record per day;

data one;

input id type $ drug $ (begin_date end_date) (: mmddyy12.);

format begin_date end_date ondate mmddyy10.;

* put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

output; ondate+1;

end;

output; drop begin_date end_date;

cards;

1 a drug1 03/01/14 03/10/14

1 a drug1 03/11/14 03/20/14

1 a drug1 03/25/14 04/05/14

1 a drug2 03/02/14 03/09/14

1 a drug2 03/17/14 03/24/14

1 a drug2 03/27/14 04/04/14

1 a drug3 09/02/14 09/09/14

1 b drug4 03/01/14 03/10/14

1 b drug4 03/11/14 03/20/14

1 b drug4 03/25/14 04/05/14

1 b drug5 09/02/14 09/09/14

2 a drug1 03/01/14 03/10/14

2 a drug1 03/11/14 03/20/14

2 a drug1 03/25/14 04/05/14

2 a drug2 03/02/14 03/09/14

2 a drug2 03/17/14 03/24/14

2 a drug2 03/27/14 04/04/14

2 a drug3 09/02/14 09/09/14

;

** sort by day to see how many drugs taken each day ;

proc sort; by id ondate type drug ;

proc print; by id ondate; id id ondate ;

title 'how many drugs taken each day'; run;

* count number of drugs taken each day by id type;

* collapse to one record per string of days;

data; set; by id ondate; * type drug ; retain typa typb;

if first.ondate then do; count=0; typa=0; typb=0; end;

if type='a' then typa+1;

if type='b' then typb+1;

count+1;

if last.ondate and count gt 1 then do;

if typa and typb then do; olap='A&B'; output; end;

if typa gt 1 then do; olap='A'; output; end;

if typb gt 1 then do; olap='B'; output; end;

end;

drop drug type typa typb;

proc sort; by id olap ondate;

proc print; id id olap ondate;

title 'Collapse by date and type where multiple drugs taken'; run;

** step 3 identify the string of contiguous days (episode);

data; set; by id olap;

dif=ondate-lag(ondate);

retain flag strtdate;

format strtdate mmddyy10.;

if first.olap then do;

contdys=0; flag=0; dif=1; strtdate=ondate;

end;

if dif=1 then contdys+1;

else do; ** end of continuous string;

dif=1; strtdate=ondate;

contdys=1;

end;

drop dif flag count;

proc print; id id olap strtdate ondate contdys olap ;

by id olap strtdate;

title 'See episode data overlap of drugs and type'; run;

***step 4 see if the continuous string of days was gt 5;

data; set; by id olap strtdate;

drop ondate ;

if last.strtdate then do;

if contdys gt 5 then output;

end;

proc print split='*';

label id='id*' strtdate='episode*start date'

contdys='continous* days' olap='type';

id id strtdate contdys olap; by id; run;

*/;

View solution in original post

AskoLötjönen · Posted 07-15-2015 06:22 AM

You need to do several steps. I added proc sort to remove original observation when "combined". There may be smarter solutions. In the end results overlapping cases are twice (both ways) but you can modify it smarter.

proc sql;

create table want as

select h1.id,

h1.drug,

h1.refill,

h1.begin_date,

coalesce(h2.end_date,h1.end_date) format=date9. as end_date

from have1 h1

left join have1 h2

on h1.id = h2.id

and h1.drug = h2.drug

and h1.refill lt h2.refill

and h1.end_date + 1 eq h2.begin_date

;

quit;

proc sort data=want out=want1 nodupkey;

by id drug end_date;

run;

proc sql;

create table overlaps as

select w1.id,

w1.drug as Drug1,

w1.refill as refill1,

w1.begin_date format=date9. as begin_date1,

w1.end_date format=date9. as end_date1,

w2.drug as drug2,

w2.refill as refill2,

w2.begin_date format=date9. as begin_date2,

w2.end_date format=date9. as end_date2

from want1 w1

left join want1 w2

on w1.id = w2.id

and w1.drug ^= w2.drug

and (w1.begin_date between w2.begin_date and w2.end_date

or w1.end_date between w2.begin_date and w2.end_date

or w2.begin_date between w1.begin_date and w1.end_date

or w2.end_date between w1.begin_date and w1.end_date

)

;

quit;

Jim_G · Posted 07-15-2015 07:51 AM

I did this before by this method.

Process each record in your input data set.

output (Write) an observation for each day the drug was taken. Start at the start date and output a record for each day until the end date.

Now you have a dataset with Id drug and date. sort by id date drug. Now read the records.in by id date drug. If first.drug count=0 If last.drug and count gt 5 then output.

Jim

Ksharp · Posted 07-15-2015 09:16 AM

I did the same exact thing as Jim did. So search your questions at this forum , that would not waste both of us time.

Code: Program

data have;
input id   drug $   refill   (begin_date   end_date) (: mmddyy12.);
format begin_date   end_date mmddyy10.;
cards;
1 drug1 0 03/01/14 03/10/14
1 drug1 1 03/11/14 03/20/14
1 drug1 2 03/25/14 04/05/14
1 drug2 0 03/02/14 03/09/14
1 drug2 1 03/17/14 03/24/14
1 drug2 2 03/27/14 04/04/14
1 drug3 0 09/02/14 09/09/14
;
run;

data temp;
 set have;
 n+1;
 date=begin_date;output;
 date=end_date;output;
 format date mmddyy10.;
 drop begin_date   end_date;
run;
data x;
 set temp;
 by n;
 if first.n and dif(date) ne 1 then group+1;
 drop n;
run;
data want;
 set x;
 by group;
 retain ref begin_date;
 if first.group then do;ref=refill;begin_date=date;end;
 if last.group then do; end_date=date; output;end;
 drop refill group date;
 format begin_date   end_date mmddyy10.;
run;

michellel · Posted 07-15-2015 09:37 AM

Thanks Xia Ke for your answer! It works for one person. How to get it done for different people? There are more than 10K people in my dataset. Sorry I should make it clear for a lot of different id.

Thanks so much for your time!

Ksharp · Posted 07-15-2015 09:44 AM

My code also can work on different Person, if your table has been sorted just like your sample data.

Jim_G · Posted 07-15-2015 12:53 PM

This code should find where there is an overlap in at least 2 drugs for 5 days per id. It also indicates the total days overlapping and high water number of drugs overlapped;

data one;

input id drug $ refill (begin_date end_date) (: mmddyy12.);

format begin_date end_date ondate mmddyy10.;

* put out a record for each day on the drug;

ondate=begin_date;

do until (ondate=end_date);

output; ondate+1;

end;

output; drop begin_date end_date refill;

cards;

1 drug1 0 03/01/14 03/10/14

1 drug1 1 03/11/14 03/20/14

1 drug1 2 03/25/14 04/05/14

1 drug2 0 03/02/14 03/09/14

1 drug2 1 03/17/14 03/24/14

1 drug2 2 03/27/14 04/04/14

1 drug3 0 09/02/14 09/09/14

2 drug1 0 03/01/14 03/10/14

2 drug1 1 03/11/14 03/20/14

2 drug1 2 03/25/14 04/05/14

2 drug2 0 03/02/14 03/09/14

2 drug2 1 03/17/14 03/24/14

2 drug2 2 03/27/14 04/04/14

2 drug3 0 09/02/14 09/09/14

;

proc sort; by id ondate drug;

proc print; id id ondate drug; run;

* count number of drugs taken each day by id;

data; set; by id ondate drug; drop drug;

if first.ondate then count=1;

if last.ondate and not first.ondate then count+1;

if last.ondate and count gt 1 then output;

proc print; run; title 'count of days where multiple drugs taken';

* ids with 5 days overlapped;

data; set; by id;

if first.id then dysovrlap=0;

dysovrlap+1;

if last.id and dysovrlap gt 5 then output;

proc print; title 'days overlap and number of drugs';run;

michellel · Posted 07-16-2015 05:47 PM

Thanks Jim_G! You are super!

Just one question. At the last step of your code, variable [dysovrlap] shows total days of overlap. How to check if there are 5 continuous days for overlap?

I also have a further question. Sorry I forgot to mention there is one more variable indicating the drug type. There are two drug types, A and B, for all these different drugs. I need to know if it is type_A and type_A overlap, or type_A and type_B overlap, or type_B and type_B overlap.

Thanks so much!

PhilC · Posted 07-15-2015 03:02 PM

This turned out to be something that was more crazy than I first thought... I don't see STEP1 being solved with out a doubly-nested loops and multiple passes through the algorithm until all time periods are concatenated.

data STEP1 (Keep=id   drug refill begin_date   end_date over_Lap);
array b_date [&MAX_ARRAY_SIZE] b_date1-b_date&MAX_ARRAY_SIZE;
array e_date [&MAX_ARRAY_SIZE] e_date1-e_date&MAX_ARRAY_SIZE;
array ovrLap [&MAX_ARRAY_SIZE] ovrLap1-ovrLap&MAX_ARRAY_SIZE;
do until (eof);
    do until (last.id);
    /*Read Input*/
      do N=1 by 1 until (last.drug);
        set have end=eof ;by id drug;
        b_date[N]=begin_date;
        e_date[N]=end_date;
        ovrLap[N]=0;
      end;
      NUM_ITEMS =N;
    /*STEP-1 Concatenate Allowed Periods*/
      NOT_FINISHED=1;
      do while (NOT_FINISHED);NOT_FINISHED=0;
        do M=1 to NUM_ITEMS ;
          if not missing(b_date) then
          do P=M+1 to NUM_ITEMS ;
            if not missing(b_date

) then
do;
if -1 <= e_date-b_date

<=5
OR -1 <= e_date

-b_date <=5 then
do;
b_date=min(b_date, b_date

);
e_date=max(e_date, e_date

);
b_date

=.;e_date

=.;
                NOT_FINISHED=1;
                M=NUM_ITEMS;P=NUM_ITEMS;
              end;
            end;
          end;
        end;
      end;
    /*STEP-2 (prep) Find and flag Overlaps*/
      do M=1 to NUM_ITEMS ;
        if not missing(b_date) then
        do P=M+1 to NUM_ITEMS ;
          if not missing(b_date

) then
do;
if e_date-b_date

>5
AND e_date

-b_date >5 then
            do;
              ovrLap=1;
              ovrLap

=1;
            end;
          end;
        end;
      end;
    /*Output*/
      do N=1 to NUM_ITEMS ;
        begin_date=b_date[N];
        end_date =e_date[N];
        over_Lap =ovrLap[N];
        refill=N;
        if not missing(b_date[N])
          then output;
      end;
    end;/*do until (last.id);*/
end;/*do until (eof);*/
stop;
run;

data STEP2 (Keep=id overlap);
id=0;
overlap=0;
do until (last.id);
set STEP1; by ID;
if over_Lap then overlap=1;
end;
output;
run;

Jim_G · Posted 07-17-2015 07:26 AM

Michell if this is helpful please mark helpful or correct.

continuous can be calculated by comparing the date in this record to the last record. if there is only 1 day different then it is continuous;

I added the type variable.

data one;

input id type $ drug $ (begin_date end_date) (: mmddyy12.);

format begin_date end_date ondate mmddyy10.;

* put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

output; ondate+1;

end;

output; drop begin_date end_date;

cards;

1 a drug1 03/01/14 03/10/14

1 a drug1 03/11/14 03/20/14

1 a drug1 03/25/14 04/05/14

1 a drug2 03/02/14 03/09/14

1 a drug2 03/17/14 03/24/14

1 a drug2 03/27/14 04/04/14

1 a drug3 09/02/14 09/09/14

1 b drug1 03/01/14 03/10/14

1 b drug1 03/11/14 03/20/14

1 b drug1 03/25/14 04/05/14

1 b drug2 03/02/14 03/09/14

1 b drug2 03/17/14 03/24/14

1 b drug2 03/27/14 04/04/14

1 b drug3 09/02/14 09/09/14

2 a drug1 03/01/14 03/10/14

2 a drug1 03/11/14 03/20/14

2 a drug1 03/25/14 04/05/14

2 a drug2 03/02/14 03/09/14

2 a drug2 03/17/14 03/24/14

2 a drug2 03/27/14 04/04/14

2 a drug3 09/02/14 09/09/14

;

proc sort; by id type ondate drug ;

*proc print; *id id type ondate drug ; run;

* count number of drugs taken each day by id type;

data; set; by id type ondate drug ; *drop drug;

if first.ondate then count=1;

if last.ondate and not first.ondate then count+1;

if last.ondate and count gt 1 then output;

proc print; run; title 'count of days where multiple drugs taken';

* ids with 5 continuous days overlapped;

data; set; by id type;

dif=ondate-lag(ondate); retain flag;

if first.type then do; contdys=0; flag=0; dif=1; end;

if dif=1 then contdys+1; else contdys=1;

if contdys gt 5 then flag=1;

if last.type and flag=1 then output;

drop drug ondate dif flag contdys;

proc print; id id type ;

title 'Overlap by type and number of drugs by id and type'; run;

michellel · Posted 07-17-2015 02:58 PM

Thanks Jim_G for your answer!

There is still one thing needed to figure out. Based on the result from your code, it can be shown overlap within each drug type, but how about compare one drug from type a and another drug from type b within one patient? I changed input information as below. The result is supposed to show there is overlap for type a for patient 1, overlap between type a and b for patient 1, and overlap for drug a for patient 2, but the result did not show the overlap between type a and type b for patient 1. Could you help me figure out this? Thanks so.......much!

data one;

input id type $ drug $ (begin_date end_date) (: mmddyy12.);

format begin_date end_date ondate mmddyy10.;

* put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

output; ondate+1;

end;

output; drop begin_date end_date;

cards;

1 a drug1 03/01/14 03/10/14

1 a drug1 03/11/14 03/20/14

1 a drug1 03/25/14 04/05/14

1 a drug2 03/02/14 03/09/14

1 a drug2 03/17/14 03/24/14

1 a drug2 03/27/14 04/04/14

1 a drug3 09/02/14 09/09/14

1 b drug4 03/01/14 03/10/14

1 b drug4 03/11/14 03/20/14

1 b drug4 03/25/14 04/05/14

1 b drug5 09/02/14 09/09/14

2 a drug1 03/01/14 03/10/14

2 a drug1 03/11/14 03/20/14

2 a drug1 03/25/14 04/05/14

2 a drug2 03/02/14 03/09/14

2 a drug2 03/17/14 03/24/14

2 a drug2 03/27/14 04/04/14

2 a drug3 09/02/14 09/09/14

;

proc sort; by id type ondate drug ;

*proc print; *id id type ondate drug ; run;

* count number of drugs taken each day by id type;

data; set; by id type ondate drug ; *drop drug;

if first.ondate then count=1;

if last.ondate and not first.ondate then count+1;

if last.ondate and count gt 1 then output;

proc print; run; title 'count of days where multiple drugs taken';

* ids with 5 continuous days overlapped;

data; set; by id type;

dif=ondate-lag(ondate); retain flag;

if first.type then do; contdys=0; flag=0; dif=1; end;

if dif=1 then contdys+1; else contdys=1;

if contdys gt 5 then flag=1;

if last.type and flag=1 then output;

drop drug ondate dif flag contdys;

proc print; id id type ;

title 'Overlap by type and number of drugs by id and type'; run;

Jim_G · Posted 07-19-2015 08:51 AM

If we have a case where one id has a string of 12 continuous days. For 6 days he overlapped ‘A’ drugs and for 6 days he overlapped ‘B’ drugs. Is it more important to report 12 days overlapped A and B drugs or to report 6 days overlap drug a and 6 days overlapped B ? Do you want to report 6 days overlaps within type or type within overlaps ?

Do you want report

Id, type, number contiuous days (gt 5)

1 A 6

1 B 6

Or Id, continuous days, types overlapped

1 12 A,B

Jim

michellel · Posted 07-19-2015 09:39 AM

Thanks Jim!!

I want to report type (A,A or A,B, or B,B) within overlaps.

Id, continuous days, types overlapped

1 12 A,B

My job is to check if (1) there are two different drugs overlap >5 days within type a, (2) if it happens between type a and type b, and/or (3) if it happens within type b.

Specifically, if the data is the following (please use the following data). I need get the result showing there are (1) drugs overlap for type a and (2) drugs overlap between type a and b for id=1, and drugs overlap for type a for id=2, like the following.

result:

Id, continuous days, types overlapped

1 XXX a,a

1 XXX a,b

2 XXX a,a

data:

id type drug begin_date end_date

1 a drug1 03/01/14 03/10/14

1 a drug1 03/11/14 03/20/14

1 a drug1 03/25/14 04/05/14

1 a drug2 03/02/14 03/09/14

1 a drug2 03/17/14 03/24/14

1 a drug2 03/27/14 04/04/14

1 a drug3 09/02/14 09/09/14

1 b drug4 03/01/14 03/10/14

1 b drug4 03/11/14 03/20/14

1 b drug4 03/25/14 04/05/14

1 b drug5 09/02/14 09/09/14

2 a drug1 03/01/14 03/10/14

2 a drug1 03/11/14 03/20/14

2 a drug1 03/25/14 04/05/14

2 a drug2 03/02/14 03/09/14

2 a drug2 03/17/14 03/24/14

2 a drug2 03/27/14 04/04/14

2 a drug3 09/02/14 09/09/14

Thanks so much in advance for your time and your answer!!!

Jim_G · Posted 07-19-2015 03:35 PM

Michelle try this code. The output looks right.

All the proc prints are for debugging Jim

** step 1 expand each date range to one record per day;

data one;

input id type $ drug $ (begin_date end_date) (: mmddyy12.);

format begin_date end_date ondate mmddyy10.;

* put out a record for each day on the drug by type;

ondate=begin_date;

do until (ondate=end_date);

output; ondate+1;

end;

output; drop begin_date end_date;

cards;

1 a drug1 03/01/14 03/10/14

1 a drug1 03/11/14 03/20/14

1 a drug1 03/25/14 04/05/14

1 a drug2 03/02/14 03/09/14

1 a drug2 03/17/14 03/24/14

1 a drug2 03/27/14 04/04/14

1 a drug3 09/02/14 09/09/14

1 b drug4 03/01/14 03/10/14

1 b drug4 03/11/14 03/20/14

1 b drug4 03/25/14 04/05/14

1 b drug5 09/02/14 09/09/14

2 a drug1 03/01/14 03/10/14

2 a drug1 03/11/14 03/20/14

2 a drug1 03/25/14 04/05/14

2 a drug2 03/02/14 03/09/14

2 a drug2 03/17/14 03/24/14

2 a drug2 03/27/14 04/04/14

2 a drug3 09/02/14 09/09/14

;

** sort by day to see how many drugs taken each day ;

proc sort; by id ondate type drug ;

proc print; by id ondate; id id ondate ;

title 'how many drugs taken each day'; run;

* step 2 count number of drugs taken each day by id type;

* collapse to one record per day;

data; set; by id ondate; * type drug ; retain typa typb;

if first.ondate then do; count=0; typa=0; typb=0; end;

if type='a' then typa=1;

if type='b' then typb=1;

count+1;

if last.ondate and count gt 1 then do;

if typa and typb then olap='A&B';

if typa and not typb then olap='A';

if typb and not typa then olap='B';

output;

end;

drop drug type typa typb;

proc print; id id ondate olap;

title 'Collapse by date and type where multiple drugs taken'; run;

** step 3 identify the string of contiguous days (episode);

data; set; by id ;

dif=ondate-lag(ondate);

retain flag strtdate typa typb typab;

format strtdate mmddyy10.;

if first.id then do;

contdys=0; flag=0; dif=1; strtdate=ondate;

typa=0; typb=0; typab=0;

end;

**** store type ----------------------;

if olap='A' then typa=1;

if olap='B' then typb=1;

if olap='A&B' then typab=1;

if dif=1 then contdys+1;

else do; ** end of continuous string;

dif=1; strtdate=ondate;

contdys=1; typa=0; typb=0; typab=0;

end;

drop dif flag count;

proc print; id id strtdate ondate contdys olap ; by id strtdate;

title 'See episode data overlap of drugs and type'; run;

***** step 4 see if the continuous string of days was gt 5;

data; set; by id strtdate; drop ondate olap typa typb typab;

if last.strtdate then do; retain overlap ' ';

if typa then substr(overlap,1,1)='A';

if typb then substr(overlap,3,1)='B';

if typab then substr(overlap,5,3)='A&B';

if contdys gt 5 then output;

overlap=' ';

end;

proc print split='*';

label id='id*' strtdate='episode*start date'

contdys='continous* days' overlap='type';

id id strtdate contdys ; by id; run;

*/;

michellel · Posted 07-20-2015 05:45 PM

Thanks Jim for your update! I tried your code and the result listed as below. I am wondering if the result can also show overlap 'A' for id=1 besides the case of A&B overlap, as there are also drug 1 and drug 2 overlap within type a of id 1. Thanks so much for your answer!

id	strtdate	contdys	overlap
1	3/1/2014	20	A&B
1	3/25/2014	12	A&B
1	9/2/2014	8	A&B
2	3/2/2014	8	A
2	3/27/2014	9	A

Code: Program

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away