Hi, I have a question about checking days overlap. I have a data set listed as below. For the same id, there might be more than one drug prescription (variable: drug). For each drug prescription, there may be refill (variable: refill, 0: first time, 1: first refill, 2: second refill.....n: nth refill). There are two variables for beginning date and ending date for prescription. There are more than 10K different id.
id drug refill begin_date end_date
1 drug1 0 03/01/14 03/10/14
1 drug1 1 03/11/14 03/20/14
1 drug1 2 03/25/14 04/05/14
1 drug2 0 03/02/14 03/09/14
1 drug2 1 03/17/14 03/24/14
1 drug2 2 03/27/14 04/04/14
1 drug3 0 09/02/14 09/09/14
2
2
2
3
3
3
.
.
.
.
n
What I need to do is to check if the overlap days for different drug for the same person are more than 5 days. In another word, I defined overlap as two different drugs for more than 5 days together.
What I plan to do is listed as below.
(step 1) to check the begin_date and end_date for the same drug. If the begin_date of refill is just the next day to the end_date of last time, then combine the date into one row. Actually I do not care the refill variable by the end. so I want my original data to be changed like below. For example, for drug1 of id=1, the first two rows will be combined into one row, because 03/11/14 is just next day to 03/10/14. I am not sure how to do this.
id drug refill begin_date end_date
1 drug1 0 03/01/14 03/20/14
1 drug1 2 03/25/14 04/05/14
1 drug2 0 03/02/14 03/09/14
1 drug2 1 03/17/14 03/24/14
1 drug2 2 03/27/14 04/04/14
1 drug3 0 03/02/14 03/09/14
2
2
2
.
.
.
n
n
(step2) after step 1, I am not sure how to compare the day period with each other for different drugs within the same id. The first difficulty is I do not want to compare the day period for the same drug (like row 1 and row 2 for drug1). The second difficulty is how to do pairwise comparison among different drugs with each other within the same id by iteration. Finally, I need create dummy variable (overlap) indicating if there are two different drugs overlap more than 5 days for each id, like below.
id overlap
1 1
2 1
3 0
.
.
.
Many thanks in advanced for your answer!
Michelle Try this code. It doesn't concentrate on how many drug overlaps per day but which type of overlap occurred. It there were 5 overlaps of drug type A It shows an overlap of type A. If there was 5 drug A overlaps on a certain day and 1 B drug on the same day, it would report a type A overlap and a type A&B overlap. Jim
** expand each date range to one record per day;
data one;
input id type $ drug $ (begin_date end_date) (: mmddyy12.);
format begin_date end_date ondate mmddyy10.;
* put out a record for each day on the drug by type;
ondate=begin_date;
do until (ondate=end_date);
output; ondate+1;
end;
output; drop begin_date end_date;
cards;
1 a drug1 03/01/14 03/10/14
1 a drug1 03/11/14 03/20/14
1 a drug1 03/25/14 04/05/14
1 a drug2 03/02/14 03/09/14
1 a drug2 03/17/14 03/24/14
1 a drug2 03/27/14 04/04/14
1 a drug3 09/02/14 09/09/14
1 b drug4 03/01/14 03/10/14
1 b drug4 03/11/14 03/20/14
1 b drug4 03/25/14 04/05/14
1 b drug5 09/02/14 09/09/14
2 a drug1 03/01/14 03/10/14
2 a drug1 03/11/14 03/20/14
2 a drug1 03/25/14 04/05/14
2 a drug2 03/02/14 03/09/14
2 a drug2 03/17/14 03/24/14
2 a drug2 03/27/14 04/04/14
2 a drug3 09/02/14 09/09/14
;
** sort by day to see how many drugs taken each day ;
proc sort; by id ondate type drug ;
proc print; by id ondate; id id ondate ;
title 'how many drugs taken each day'; run;
* count number of drugs taken each day by id type;
* collapse to one record per string of days;
data; set; by id ondate; * type drug ; retain typa typb;
if first.ondate then do; count=0; typa=0; typb=0; end;
if type='a' then typa+1;
if type='b' then typb+1;
count+1;
if last.ondate and count gt 1 then do;
if typa and typb then do; olap='A&B'; output; end;
if typa gt 1 then do; olap='A'; output; end;
if typb gt 1 then do; olap='B'; output; end;
end;
drop drug type typa typb;
proc sort; by id olap ondate;
proc print; id id olap ondate;
title 'Collapse by date and type where multiple drugs taken'; run;
** step 3 identify the string of contiguous days (episode);
data; set; by id olap;
dif=ondate-lag(ondate);
retain flag strtdate;
format strtdate mmddyy10.;
if first.olap then do;
contdys=0; flag=0; dif=1; strtdate=ondate;
end;
if dif=1 then contdys+1;
else do; ** end of continuous string;
dif=1; strtdate=ondate;
contdys=1;
end;
drop dif flag count;
proc print; id id olap strtdate ondate contdys olap ;
by id olap strtdate;
title 'See episode data overlap of drugs and type'; run;
***step 4 see if the continuous string of days was gt 5;
data; set; by id olap strtdate;
drop ondate ;
if last.strtdate then do;
if contdys gt 5 then output;
end;
proc print split='*';
label id='id*' strtdate='episode*start date'
contdys='continous* days' olap='type';
id id strtdate contdys olap; by id; run;
*/;
You need to do several steps. I added proc sort to remove original observation when "combined". There may be smarter solutions. In the end results overlapping cases are twice (both ways) but you can modify it smarter.
proc sql;
create table want as
select h1.id,
h1.drug,
h1.refill,
h1.begin_date,
coalesce(h2.end_date,h1.end_date) format=date9. as end_date
from have1 h1
left join have1 h2
on h1.id = h2.id
and h1.drug = h2.drug
and h1.refill lt h2.refill
and h1.end_date + 1 eq h2.begin_date
;
quit;
proc sort data=want out=want1 nodupkey;
by id drug end_date;
run;
proc sql;
create table overlaps as
select w1.id,
w1.drug as Drug1,
w1.refill as refill1,
w1.begin_date format=date9. as begin_date1,
w1.end_date format=date9. as end_date1,
w2.drug as drug2,
w2.refill as refill2,
w2.begin_date format=date9. as begin_date2,
w2.end_date format=date9. as end_date2
from want1 w1
left join want1 w2
on w1.id = w2.id
and w1.drug ^= w2.drug
and (w1.begin_date between w2.begin_date and w2.end_date
or w1.end_date between w2.begin_date and w2.end_date
or w2.begin_date between w1.begin_date and w1.end_date
or w2.end_date between w1.begin_date and w1.end_date
)
;
quit;
I did this before by this method.
Process each record in your input data set.
output (Write) an observation for each day the drug was taken. Start at the start date and output a record for each day until the end date.
Now you have a dataset with Id drug and date. sort by id date drug. Now read the records.in by id date drug. If first.drug count=0 If last.drug and count gt 5 then output.
Jim
I did the same exact thing as Jim did. So search your questions at this forum , that would not waste both of us time.
data have;
input id drug $ refill (begin_date end_date) (: mmddyy12.);
format begin_date end_date mmddyy10.;
cards;
1 drug1 0 03/01/14 03/10/14
1 drug1 1 03/11/14 03/20/14
1 drug1 2 03/25/14 04/05/14
1 drug2 0 03/02/14 03/09/14
1 drug2 1 03/17/14 03/24/14
1 drug2 2 03/27/14 04/04/14
1 drug3 0 09/02/14 09/09/14
;
run;
data temp;
set have;
n+1;
date=begin_date;output;
date=end_date;output;
format date mmddyy10.;
drop begin_date end_date;
run;
data x;
set temp;
by n;
if first.n and dif(date) ne 1 then group+1;
drop n;
run;
data want;
set x;
by group;
retain ref begin_date;
if first.group then do;ref=refill;begin_date=date;end;
if last.group then do; end_date=date; output;end;
drop refill group date;
format begin_date end_date mmddyy10.;
run;
Thanks Xia Ke for your answer! It works for one person. How to get it done for different people? There are more than 10K people in my dataset. Sorry I should make it clear for a lot of different id.
Thanks so much for your time!
My code also can work on different Person, if your table has been sorted just like your sample data.
This code should find where there is an overlap in at least 2 drugs for 5 days per id. It also indicates the total days overlapping and high water number of drugs overlapped;
data one;
input id drug $ refill (begin_date end_date) (: mmddyy12.);
format begin_date end_date ondate mmddyy10.;
* put out a record for each day on the drug;
ondate=begin_date;
do until (ondate=end_date);
output; ondate+1;
end;
output; drop begin_date end_date refill;
cards;
1 drug1 0 03/01/14 03/10/14
1 drug1 1 03/11/14 03/20/14
1 drug1 2 03/25/14 04/05/14
1 drug2 0 03/02/14 03/09/14
1 drug2 1 03/17/14 03/24/14
1 drug2 2 03/27/14 04/04/14
1 drug3 0 09/02/14 09/09/14
2 drug1 0 03/01/14 03/10/14
2 drug1 1 03/11/14 03/20/14
2 drug1 2 03/25/14 04/05/14
2 drug2 0 03/02/14 03/09/14
2 drug2 1 03/17/14 03/24/14
2 drug2 2 03/27/14 04/04/14
2 drug3 0 09/02/14 09/09/14
;
proc sort; by id ondate drug;
proc print; id id ondate drug; run;
* count number of drugs taken each day by id;
data; set; by id ondate drug; drop drug;
if first.ondate then count=1;
if last.ondate and not first.ondate then count+1;
if last.ondate and count gt 1 then output;
proc print; run; title 'count of days where multiple drugs taken';
* ids with 5 days overlapped;
data; set; by id;
if first.id then dysovrlap=0;
dysovrlap+1;
if last.id and dysovrlap gt 5 then output;
proc print; title 'days overlap and number of drugs';run;
Thanks Jim_G! You are super!
Just one question. At the last step of your code, variable [dysovrlap] shows total days of overlap. How to check if there are 5 continuous days for overlap?
I also have a further question. Sorry I forgot to mention there is one more variable indicating the drug type. There are two drug types, A and B, for all these different drugs. I need to know if it is type_A and type_A overlap, or type_A and type_B overlap, or type_B and type_B overlap.
Thanks so much!
This turned out to be something that was more crazy than I first thought... I don't see STEP1 being solved with out a doubly-nested loops and multiple passes through the algorithm until all time periods are concatenated.
data STEP1 (Keep=id drug refill begin_date end_date over_Lap);
array b_date [&MAX_ARRAY_SIZE] b_date1-b_date&MAX_ARRAY_SIZE;
array e_date [&MAX_ARRAY_SIZE] e_date1-e_date&MAX_ARRAY_SIZE;
array ovrLap [&MAX_ARRAY_SIZE] ovrLap1-ovrLap&MAX_ARRAY_SIZE;
do until (eof);
do until (last.id);
/*Read Input*/
do N=1 by 1 until (last.drug);
set have end=eof ;by id drug;
b_date[N]=begin_date;
e_date[N]=end_date;
ovrLap[N]=0;
end;
NUM_ITEMS =N;
/*STEP-1 Concatenate Allowed Periods*/
NOT_FINISHED=1;
do while (NOT_FINISHED);NOT_FINISHED=0;
do M=1 to NUM_ITEMS ;
if not missing(b_date
do P=M+1 to NUM_ITEMS ;
if not missing(b_date
) then
do;
if -1 <= e_date
<=5
OR -1 <= e_date
-b_date
do;
b_date
);
e_date
);
b_date
=.;e_date
=.;
NOT_FINISHED=1;
M=NUM_ITEMS;P=NUM_ITEMS;
end;
end;
end;
end;
end;
/*STEP-2 (prep) Find and flag Overlaps*/
do M=1 to NUM_ITEMS ;
if not missing(b_date
do P=M+1 to NUM_ITEMS ;
if not missing(b_date
) then
do;
if e_date
>5
AND e_date
-b_date
do;
ovrLap
ovrLap
=1;
end;
end;
end;
end;
/*Output*/
do N=1 to NUM_ITEMS ;
begin_date=b_date[N];
end_date =e_date[N];
over_Lap =ovrLap[N];
refill=N;
if not missing(b_date[N])
then output;
end;
end;/*do until (last.id);*/
end;/*do until (eof);*/
stop;
run;
data STEP2 (Keep=id overlap);
id=0;
overlap=0;
do until (last.id);
set STEP1; by ID;
if over_Lap then overlap=1;
end;
output;
run;
Michell if this is helpful please mark helpful or correct.
continuous can be calculated by comparing the date in this record to the last record. if there is only 1 day different then it is continuous;
I added the type variable.
data one;
input id type $ drug $ (begin_date end_date) (: mmddyy12.);
format begin_date end_date ondate mmddyy10.;
* put out a record for each day on the drug by type;
ondate=begin_date;
do until (ondate=end_date);
output; ondate+1;
end;
output; drop begin_date end_date;
cards;
1 a drug1 03/01/14 03/10/14
1 a drug1 03/11/14 03/20/14
1 a drug1 03/25/14 04/05/14
1 a drug2 03/02/14 03/09/14
1 a drug2 03/17/14 03/24/14
1 a drug2 03/27/14 04/04/14
1 a drug3 09/02/14 09/09/14
1 b drug1 03/01/14 03/10/14
1 b drug1 03/11/14 03/20/14
1 b drug1 03/25/14 04/05/14
1 b drug2 03/02/14 03/09/14
1 b drug2 03/17/14 03/24/14
1 b drug2 03/27/14 04/04/14
1 b drug3 09/02/14 09/09/14
2 a drug1 03/01/14 03/10/14
2 a drug1 03/11/14 03/20/14
2 a drug1 03/25/14 04/05/14
2 a drug2 03/02/14 03/09/14
2 a drug2 03/17/14 03/24/14
2 a drug2 03/27/14 04/04/14
2 a drug3 09/02/14 09/09/14
;
proc sort; by id type ondate drug ;
*proc print; *id id type ondate drug ; run;
* count number of drugs taken each day by id type;
data; set; by id type ondate drug ; *drop drug;
if first.ondate then count=1;
if last.ondate and not first.ondate then count+1;
if last.ondate and count gt 1 then output;
proc print; run; title 'count of days where multiple drugs taken';
* ids with 5 continuous days overlapped;
data; set; by id type;
dif=ondate-lag(ondate); retain flag;
if first.type then do; contdys=0; flag=0; dif=1; end;
if dif=1 then contdys+1; else contdys=1;
if contdys gt 5 then flag=1;
if last.type and flag=1 then output;
drop drug ondate dif flag contdys;
proc print; id id type ;
title 'Overlap by type and number of drugs by id and type'; run;
Thanks Jim_G for your answer!
There is still one thing needed to figure out. Based on the result from your code, it can be shown overlap within each drug type, but how about compare one drug from type a and another drug from type b within one patient? I changed input information as below. The result is supposed to show there is overlap for type a for patient 1, overlap between type a and b for patient 1, and overlap for drug a for patient 2, but the result did not show the overlap between type a and type b for patient 1. Could you help me figure out this? Thanks so.......much!
data one;
input id type $ drug $ (begin_date end_date) (: mmddyy12.);
format begin_date end_date ondate mmddyy10.;
* put out a record for each day on the drug by type;
ondate=begin_date;
do until (ondate=end_date);
output; ondate+1;
end;
output; drop begin_date end_date;
cards;
1 a drug1 03/01/14 03/10/14
1 a drug1 03/11/14 03/20/14
1 a drug1 03/25/14 04/05/14
1 a drug2 03/02/14 03/09/14
1 a drug2 03/17/14 03/24/14
1 a drug2 03/27/14 04/04/14
1 a drug3 09/02/14 09/09/14
1 b drug4 03/01/14 03/10/14
1 b drug4 03/11/14 03/20/14
1 b drug4 03/25/14 04/05/14
1 b drug5 09/02/14 09/09/14
2 a drug1 03/01/14 03/10/14
2 a drug1 03/11/14 03/20/14
2 a drug1 03/25/14 04/05/14
2 a drug2 03/02/14 03/09/14
2 a drug2 03/17/14 03/24/14
2 a drug2 03/27/14 04/04/14
2 a drug3 09/02/14 09/09/14
;
proc sort; by id type ondate drug ;
*proc print; *id id type ondate drug ; run;
* count number of drugs taken each day by id type;
data; set; by id type ondate drug ; *drop drug;
if first.ondate then count=1;
if last.ondate and not first.ondate then count+1;
if last.ondate and count gt 1 then output;
proc print; run; title 'count of days where multiple drugs taken';
* ids with 5 continuous days overlapped;
data; set; by id type;
dif=ondate-lag(ondate); retain flag;
if first.type then do; contdys=0; flag=0; dif=1; end;
if dif=1 then contdys+1; else contdys=1;
if contdys gt 5 then flag=1;
if last.type and flag=1 then output;
drop drug ondate dif flag contdys;
proc print; id id type ;
title 'Overlap by type and number of drugs by id and type'; run;
If we have a case where one id has a string of 12 continuous days. For 6 days he overlapped ‘A’ drugs and for 6 days he overlapped ‘B’ drugs. Is it more important to report 12 days overlapped A and B drugs or to report 6 days overlap drug a and 6 days overlapped B ? Do you want to report 6 days overlaps within type or type within overlaps ?
Do you want report
Id, type, number contiuous days (gt 5)
1 A 6
1 B 6
Or Id, continuous days, types overlapped
1 12 A,B
Jim
Thanks Jim!!
I want to report type (A,A or A,B, or B,B) within overlaps.
Id, continuous days, types overlapped
1 12 A,B
My job is to check if (1) there are two different drugs overlap >5 days within type a, (2) if it happens between type a and type b, and/or (3) if it happens within type b.
Specifically, if the data is the following (please use the following data). I need get the result showing there are (1) drugs overlap for type a and (2) drugs overlap between type a and b for id=1, and drugs overlap for type a for id=2, like the following.
result:
Id, continuous days, types overlapped
1 XXX a,a
1 XXX a,b
2 XXX a,a
data:
id type drug begin_date end_date
1 a drug1 03/01/14 03/10/14
1 a drug1 03/11/14 03/20/14
1 a drug1 03/25/14 04/05/14
1 a drug2 03/02/14 03/09/14
1 a drug2 03/17/14 03/24/14
1 a drug2 03/27/14 04/04/14
1 a drug3 09/02/14 09/09/14
1 b drug4 03/01/14 03/10/14
1 b drug4 03/11/14 03/20/14
1 b drug4 03/25/14 04/05/14
1 b drug5 09/02/14 09/09/14
2 a drug1 03/01/14 03/10/14
2 a drug1 03/11/14 03/20/14
2 a drug1 03/25/14 04/05/14
2 a drug2 03/02/14 03/09/14
2 a drug2 03/17/14 03/24/14
2 a drug2 03/27/14 04/04/14
2 a drug3 09/02/14 09/09/14
Thanks so much in advance for your time and your answer!!!
Michelle try this code. The output looks right.
All the proc prints are for debugging Jim
** step 1 expand each date range to one record per day;
data one;
input id type $ drug $ (begin_date end_date) (: mmddyy12.);
format begin_date end_date ondate mmddyy10.;
* put out a record for each day on the drug by type;
ondate=begin_date;
do until (ondate=end_date);
output; ondate+1;
end;
output; drop begin_date end_date;
cards;
1 a drug1 03/01/14 03/10/14
1 a drug1 03/11/14 03/20/14
1 a drug1 03/25/14 04/05/14
1 a drug2 03/02/14 03/09/14
1 a drug2 03/17/14 03/24/14
1 a drug2 03/27/14 04/04/14
1 a drug3 09/02/14 09/09/14
1 b drug4 03/01/14 03/10/14
1 b drug4 03/11/14 03/20/14
1 b drug4 03/25/14 04/05/14
1 b drug5 09/02/14 09/09/14
2 a drug1 03/01/14 03/10/14
2 a drug1 03/11/14 03/20/14
2 a drug1 03/25/14 04/05/14
2 a drug2 03/02/14 03/09/14
2 a drug2 03/17/14 03/24/14
2 a drug2 03/27/14 04/04/14
2 a drug3 09/02/14 09/09/14
;
** sort by day to see how many drugs taken each day ;
proc sort; by id ondate type drug ;
proc print; by id ondate; id id ondate ;
title 'how many drugs taken each day'; run;
* step 2 count number of drugs taken each day by id type;
* collapse to one record per day;
data; set; by id ondate; * type drug ; retain typa typb;
if first.ondate then do; count=0; typa=0; typb=0; end;
if type='a' then typa=1;
if type='b' then typb=1;
count+1;
if last.ondate and count gt 1 then do;
if typa and typb then olap='A&B';
if typa and not typb then olap='A';
if typb and not typa then olap='B';
output;
end;
drop drug type typa typb;
proc print; id id ondate olap;
title 'Collapse by date and type where multiple drugs taken'; run;
** step 3 identify the string of contiguous days (episode);
data; set; by id ;
dif=ondate-lag(ondate);
retain flag strtdate typa typb typab;
format strtdate mmddyy10.;
if first.id then do;
contdys=0; flag=0; dif=1; strtdate=ondate;
typa=0; typb=0; typab=0;
end;
**** store type ----------------------;
if olap='A' then typa=1;
if olap='B' then typb=1;
if olap='A&B' then typab=1;
if dif=1 then contdys+1;
else do; ** end of continuous string;
dif=1; strtdate=ondate;
contdys=1; typa=0; typb=0; typab=0;
end;
drop dif flag count;
proc print; id id strtdate ondate contdys olap ; by id strtdate;
title 'See episode data overlap of drugs and type'; run;
***** step 4 see if the continuous string of days was gt 5;
data; set; by id strtdate; drop ondate olap typa typb typab;
if last.strtdate then do; retain overlap ' ';
if typa then substr(overlap,1,1)='A';
if typb then substr(overlap,3,1)='B';
if typab then substr(overlap,5,3)='A&B';
if contdys gt 5 then output;
overlap=' ';
end;
proc print split='*';
label id='id*' strtdate='episode*start date'
contdys='continous* days' overlap='type';
id id strtdate contdys ; by id; run;
*/;
Thanks Jim for your update! I tried your code and the result listed as below. I am wondering if the result can also show overlap 'A' for id=1 besides the case of A&B overlap, as there are also drug 1 and drug 2 overlap within type a of id 1. Thanks so much for your answer!
id | strtdate | contdys | overlap |
1 | 3/1/2014 | 20 | A&B |
1 | 3/25/2014 | 12 | A&B |
1 | 9/2/2014 | 8 | A&B |
2 | 3/2/2014 | 8 | A |
2 | 3/27/2014 | 9 | A |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.