Solved: dealing with time periods with start and end date and overlap

lillymaginta · Posted 12-01-2016 12:20 PM

drug   start                    end
a        01/01/2001    02/15/2001
b        01/02/2001    02/10/2001
c        04/05/2002     06/07/2002
d        04/15/2002     07/01/2002
e       04/01/2002      07/18/2002
f       04/05/2002      06/01/2002

I want to report drug that overlapped by 30 days or more.

Output:

a-b

c-d-e-f

Ksharp · Posted 12-05-2016 05:10 AM

OK. So you also care about END-START ?




data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or (end-start) lt 30 or first.id then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

View solution in original post

PGStats · Posted 12-01-2016 02:45 PM

There could be many interpretations to your sample data. To help clarify things, what would you expect for the data:

a 2002/01/01 2002/01/31
b 2002/01/01 2002/12/31
c 2002/12/01 2002/12/31
d 2002/06/01 2003/06/30

PG

lillymaginta · Posted 12-02-2016 09:44 AM

Ignoring patients ids, I am just trying to get aggregate numbers over a large data. I want to know the total number of prescriptions that overlap for 30 days or more.

If two medications overlap, then 2

if three medications overlap 3.

The output above shows the classes that overlaped. so for 2 a-b and so on

Ksharp · Posted 12-02-2016 06:53 AM


data have;
input drug  $ start : mmddyy10.    end : mmddyy10.;
format start end mmddyy10.;
cards;
a        01/01/2001    02/15/2001
b        01/02/2001    02/10/2001
c        04/05/2002     06/07/2002
d        04/15/2002     07/01/2002
e       04/01/2002      07/18/2002
f       04/05/2002      06/01/2002 
;
run;

data have;
 set have;
 dif=lag(end)-start;
 if dif lt 30 then group+1;
run;
data want;
length want $ 200;
 do i=1 by 1 until(last.group);
  set have;
  by group;
  want=catx('-',want,drug);
 end;
if i ne 1 then output;
keep want;
run;

lillymaginta · Posted 12-02-2016 10:01 AM

KSHARP thank you! your code works perfect for the small dataset that has multiple medications for the same patient. Can one adjust the code if you have multiple ids? How can one modify the code if you have something like this with the first column be pt id?

1 a        01/01/2001    02/15/2001
1 b        01/02/2001    02/10/2001
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     07/01/2002
1 e       04/01/2002      07/18/2002
1 f       04/05/2002      06/01/2002

2 a        01/01/2001    02/15/2001
2 b        01/02/2001    02/10/2001
2 c        04/05/2002     06/07/2002
2 d        04/15/2002     07/01/2002
2 e       04/01/2002      07/18/2002
2 f       04/05/2002      06/01/2002

ballardw · Posted 12-02-2016 06:47 PM

Perhaps something like this if you want the groups within id, which makes more sense than across id.

data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 a        01/01/2001    02/15/2001
1 b        01/02/2001    02/10/2001
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     07/01/2002
1 e       04/01/2002      07/18/2002
1 f       04/05/2002      06/01/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;

   if first.id then do;
      group=.;
      dif =.;
   end;
   if dif lt 30 then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

I changed some date values for id=2 so that the results differed between Id. You need to decide if the rules have been applied correctly.

If your main data isn't sorted by ID and start date it should be before doing the step that creates temp.

lillymaginta · Posted 12-03-2016 11:28 AM

Super! Thank you ballardw and ksharp!

lillymaginta · Posted 12-03-2016 11:36 AM

I have one more questions.

Lets assume the output for id is the following:

1 a-b

1 a-b-c

1 a-b-c-d

1 a-b-c-e

2 a-b

2-c-d

How can I create table the keeps only the highest combination for each id. I mean here we have a combintion of two drugs, three drugs, and four drugs.

I want to create a table that tell me the highest combination for patient 1 was 4 and for patient 2 is 2 and so on. Here is the output

1 4

2 2

Ksharp · Posted 12-03-2016 09:47 PM

data x;
input a b $20.;
cards;
1 a-b
1 a-b-c
1 a-b-c-d 
1 a-b-c-e
2 a-b
2 c-d 
;
run;
proc sql;
select a,max(countc(b,'-'))+1 as max
 from x
  group by a;
quit;

Ksharp · Posted 12-03-2016 03:17 AM

Yeah. Try @ballardw 's code , or this one:


data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 a        01/01/2001    02/15/2001
1 b        01/02/2001    02/10/2001
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     07/01/2002
1 e       04/01/2002      07/18/2002
1 f       04/05/2002      06/01/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or first.id then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

lillymaginta · Posted 12-04-2016 05:57 AM

Thank you ksharp, I just realized that the output of the first code is incorrect . The 30 days overlap should be 30 days or more and should his card drugs that overlap in less than 30 days. For example, for the entries below, the code should not retain any drugs for this patient because there is no overlap of 30 days or more between the different drugs.

1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002

Ksharp · Posted 12-05-2016 05:10 AM

OK. So you also care about END-START ?




data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or (end-start) lt 30 or first.id then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

lillymaginta · Posted 12-05-2016 08:38 AM

Thank you so much Ksharp!! five starts!

lillymaginta · Posted 12-05-2016 11:38 AM

one more question, after the first code, how can I exclude drugs that overlap for the same drug. For example if I have the following outpit:

a-b-c-a, I want to drop the duplicate a and keep it as a-b-c

the same for a-a, I dont want to count the overlap if it is the same drug . Thank you again for all the help!

Ksharp · Posted 12-05-2016 10:23 PM

No problem.




data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
2 f       04/25/2002      07/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or (end-start) lt 30 or first.id then group+1;
run;

data want;
array x{99999} $ 100 _TEMPORARY_;
n=0;
call missing(of x{*});
length want $ 200;

   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      if drug not in x then do;
       n+1;x{n}=drug;
       want=catx('-',want,drug);
      end;
   end;
   if i ne 1 then output;
   keep id want;
run;

dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Re: dealing with time periods with start and end date and overlap

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!