BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lillymaginta
Lapis Lazuli | Level 10
drug   start                    end
a        01/01/2001    02/15/2001
b        01/02/2001    02/10/2001
c        04/05/2002     06/07/2002
d        04/15/2002     07/01/2002
e       04/01/2002      07/18/2002
f       04/05/2002      06/01/2002 

I want to report drug that overlapped by 30 days or more. 

Output: 

a-b

c-d-e-f 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
OK. So you also care about END-START ?




data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or (end-start) lt 30 or first.id then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

View solution in original post

15 REPLIES 15
PGStats
Opal | Level 21

There could be many interpretations to your sample data. To help clarify things, what would you expect for the data:

 

a 2002/01/01 2002/01/31
b 2002/01/01 2002/12/31
c 2002/12/01 2002/12/31
d 2002/06/01 2003/06/30
PG
lillymaginta
Lapis Lazuli | Level 10

Ignoring patients ids, I am just trying to get aggregate numbers over a large data. I want to know the total number of prescriptions that overlap for 30 days or more. 

If two medications overlap, then 2 

if three medications overlap 3. 

The output above shows the classes that overlaped. so for 2 a-b and so on 

Ksharp
Super User

data have;
input drug  $ start : mmddyy10.    end : mmddyy10.;
format start end mmddyy10.;
cards;
a        01/01/2001    02/15/2001
b        01/02/2001    02/10/2001
c        04/05/2002     06/07/2002
d        04/15/2002     07/01/2002
e       04/01/2002      07/18/2002
f       04/05/2002      06/01/2002 
;
run;

data have;
 set have;
 dif=lag(end)-start;
 if dif lt 30 then group+1;
run;
data want;
length want $ 200;
 do i=1 by 1 until(last.group);
  set have;
  by group;
  want=catx('-',want,drug);
 end;
if i ne 1 then output;
keep want;
run;


lillymaginta
Lapis Lazuli | Level 10

KSHARP thank you! your code works perfect for the small dataset that has multiple medications for the same patient. Can one adjust the code if you have multiple ids? How can one modify the code if you have something like this with the first column be pt id? 

 

1 a        01/01/2001    02/15/2001
1 b        01/02/2001    02/10/2001
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     07/01/2002
1 e       04/01/2002      07/18/2002
1 f       04/05/2002      06/01/2002
2 a        01/01/2001    02/15/2001
2 b        01/02/2001    02/10/2001
2 c        04/05/2002     06/07/2002
2 d        04/15/2002     07/01/2002
2 e       04/01/2002      07/18/2002
2 f       04/05/2002      06/01/2002

 

ballardw
Super User

Perhaps something like this if you want the groups within id, which makes more sense than across id.

data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 a        01/01/2001    02/15/2001
1 b        01/02/2001    02/10/2001
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     07/01/2002
1 e       04/01/2002      07/18/2002
1 f       04/05/2002      06/01/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;

   if first.id then do;
      group=.;
      dif =.;
   end;
   if dif lt 30 then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

I changed some date values for id=2 so that the results differed between Id. You need to decide if the rules have been applied correctly.

 

 

If your main data isn't sorted by ID and start date it should be before doing the step that creates temp.

lillymaginta
Lapis Lazuli | Level 10

Super! Thank you 

lillymaginta
Lapis Lazuli | Level 10

I have one more questions. 

Lets assume the output for id is the following:

1 a-b

1 a-b-c

1 a-b-c-d 

1 a-b-c-e

2 a-b

2-c-d 

How can I create table the keeps only the highest combination for each id. I mean here we have a combintion of two drugs, three drugs, and four drugs. 

I want to create a table that tell me the highest combination for patient 1 was 4 and for patient 2 is 2 and so on. Here is the output 

1  4

2 2 

 

Ksharp
Super User
data x;
input a b $20.;
cards;
1 a-b
1 a-b-c
1 a-b-c-d 
1 a-b-c-e
2 a-b
2 c-d 
;
run;
proc sql;
select a,max(countc(b,'-'))+1 as max
 from x
  group by a;
quit;

Ksharp
Super User
Yeah. Try @ballardw 's code , or this one:


data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 a        01/01/2001    02/15/2001
1 b        01/02/2001    02/10/2001
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     07/01/2002
1 e       04/01/2002      07/18/2002
1 f       04/05/2002      06/01/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or first.id then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

lillymaginta
Lapis Lazuli | Level 10

Thank you ksharp, I just realized that the output of the first code is incorrect . The 30 days overlap should be 30 days or more and should his card drugs that overlap in less than 30 days. For example, for the entries below, the code should not retain any drugs for this patient because there is no overlap of 30 days or more between the different drugs. 

1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002

 

Ksharp
Super User
OK. So you also care about END-START ?




data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or (end-start) lt 30 or first.id then group+1;
run;

data want;
   length want $ 200;
   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      want=catx('-',want,drug);
   end;
   if i ne 1 then output;
   keep id want;
run;

lillymaginta
Lapis Lazuli | Level 10

Thank you so much Ksharp!! five starts! 

lillymaginta
Lapis Lazuli | Level 10

one more question, after the first code, how can I exclude drugs that overlap for the same drug. For example if I have the following outpit:

a-b-c-a, I want to drop the duplicate a and keep it as a-b-c

the same for a-a, I dont want to count the overlap if it is the same drug . Thank you again for all the help! 

Ksharp
Super User
No problem.




data have;
   input id drug  $ start : mmddyy10.    end : mmddyy10.;
   format start end mmddyy10.;
cards;
1 c        04/05/2002     06/07/2002
1 d        04/15/2002     04/25/2002
1 e       04/15/2002      04/18/2002
2 a        01/01/2002    02/15/2002
2 b        01/05/2002    03/19/2002
2 c        01/24/2002     02/17/2002
2 d        04/10/2002     07/01/2002
2 e       04/01/2002      07/28/2002
2 f       04/25/2002      06/30/2002
2 f       04/25/2002      07/30/2002
;
run;

data temp;
   set have;
   by id;
   dif=lag(end)-start;
   if dif lt 30 or (end-start) lt 30 or first.id then group+1;
run;

data want;
array x{99999} $ 100 _TEMPORARY_;
n=0;
call missing(of x{*});
length want $ 200;

   do i=1 by 1 until(last.group);
      set Temp;
      by id group;
      if drug not in x then do;
       n+1;x{n}=drug;
       want=catx('-',want,drug);
      end;
   end;
   if i ne 1 then output;
   keep id want;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2939 views
  • 5 likes
  • 4 in conversation