BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SR11
Obsidian | Level 7

Hi,

I want to flag those group of observations(within one unique ID) which have overlaps(at least for one day)between two groups of drugs. Group 1 drugs are Drug A, B and C and Group 2 consists of Drug XX, XY, YY and Drug YZ.

So, for example, in one unique ID if there is at least one day overlap between drug A and drug XX then Flag=1 in both observations. If no overlap, then Flag=0. If there is any overlap between Drug A and Drug B then still Flag=0, because I want to see the overlaps between two groups of drugs.  

Thanks

Data- I have

ID

Med

Start_date

End_date

1

A

05/25/2017

10/30/2017

1

B

05/26/2017

06/03/2017

1

C

10/30/2017

10/30/2019

1

XX

01/02/2020

02/03/2020

2

A

03/04/2015

04/25/2015

2

XY

04/23/2015

04/22/2016

2

YY

04/22/2016

04/21/2017

3

XY

02/22/2019

03/21/2019

3

A

03/18/2019

05/15/2019

3

B

05/16/2019

02/01/2020

3

YY

01/20/2017

06/30/2017

3

YZ

01/01/2020

05/05/2020

 

Data- I want

ID

Med

Start_date

End_date

Flag

1

A

05/25/2017

10/30/2017

0

1

B

05/26/2017

06/03/2017

0

1

C

10/30/2017

10/30/2019

0

1

XX

01/02/2020

02/03/2020

0

2

A

03/04/2015

04/25/2015

1

2

XY

04/23/2015

04/22/2016

1

2

YY

04/22/2016

04/21/2017

0

3

XY

02/22/2019

03/27/2019

1

3

A

03/18/2019

03/25/2019

1

3

B

05/16/2019

02/01/2020

1

3

YY

01/20/2017

06/30/2017

0

3

YZ

01/01/2020

05/05/2020

1

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try this:

 

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

proc sql;
/* create table want as */ select *, case when med in ("A", "B", "C") then exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and End_date >= a.Start_Date and Start_Date <= a.End_date ) when med in ("XX", "XY", "YY", "YZ") then exists (select * from have where ID=a.ID and med in ("A", "B", "C") and End_date >= a.Start_Date and Start_Date <= a.End_date ) else 0 end as Flag from have as a; quit;
PG

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

Try this:

 

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

proc sql;
/* create table want as */ select *, case when med in ("A", "B", "C") then exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and End_date >= a.Start_Date and Start_Date <= a.End_date ) when med in ("XX", "XY", "YY", "YZ") then exists (select * from have where ID=a.ID and med in ("A", "B", "C") and End_date >= a.Start_Date and Start_Date <= a.End_date ) else 0 end as Flag from have as a; quit;
PG
SR11
Obsidian | Level 7
Thank you very much for this solution.
SASKiwi
PROC Star

Here is another SQL variation:

data have;
format Start_date End_date mmddyy10.;
input ID $ Med $ MedGroup $ Start_date : mmddyy10. End_date : mmddyy10.;
datalines;
1 A  1 05/25/2017 10/30/2017
1 B  1 05/26/2017 06/03/2017
1 C  1 10/30/2017 10/30/2019
1 XX 2 01/02/2020 02/03/2020
2 A  1 03/04/2015 04/25/2015
2 XY 2 04/23/2015 04/22/2016
2 YY 2 04/22/2016 04/21/2017
3 XY 2 02/22/2019 03/21/2019
3 A  1 03/18/2019 05/15/2019
3 B  1 05/16/2019 02/01/2020
3 YY 2 01/20/2017 06/30/2017
3 YZ 2 01/01/2020 05/05/2020
;
run;

proc sql;
  create table want as
  select A.*
         ,B1.Min_Start_date1
         ,B1.Max_End_date1
         ,B2.Min_Start_date2
         ,B2.Max_End_date2
         ,case
            when (MedGroup = '1' and (A.Start_Date <= Max_End_date2 and A.End_date >= Min_Start_date2)) then 1 
            when (MedGroup = '2' and (A.Start_Date <= Max_End_date1 and A.End_date >= Min_Start_date1)) then 1 
            else 0
          end as Flag 

  from have as A
  left join
  (select ID 
         ,min(Start_date) as Min_Start_date1 format = mmddyy10.
         ,max(End_date ) as Max_End_date1 format = mmddyy10.
  from have
  where MedGroup = '1'
  group by ID
  ) as B1
  on A.ID = B1.ID

  left join
  (select ID
         ,min(Start_date) as Min_Start_date2 format = mmddyy10.
         ,max(End_date ) as Max_End_date2 format = mmddyy10.
  from have
  where MedGroup = '2'
  group by ID
  ) as B2
  on A.ID = B2.ID
  ;
quit;
SR11
Obsidian | Level 7
Thank you very much for this solution!
mkeintz
PROC Star

Give that the data are sorted by ID, there may be a performance advantage in using a data step with a BY ID construct:

 

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

%let begdate=01jan2015;
%let enddate=31dec2020;
data want (drop=_:);
  array med_range {1:2,%sysevalf("&begdate"d):%sysevalf("&enddate"d)} _temporary_ ;

  set have (in=first_pass)   have (in=final_pass);
  by id;
  if first.id then call missing(of med_range{*});
  if med in ('A','B','C') then _grp=1;
  else _grp=2;

  if first_pass then do _d=start_date to end_date;
    med_range{_grp,_d}=1; 
  end;

  if final_pass;
  flag=0;
  _xgrp=3-_grp;  /*_grp=2 maps to _xgrp=1 and 1 maps to 2*/
  do _d=start_date to end_date until (flag=1);
    if med_range{_xgrp,_d}=1 then flag=1;
  end; 
run;

Each by group is processed twice.  The first pass sets up a 2-row matrix corresponding to the date range universe of your data.  Row 1 (_GRP=1) gets assigned 1's for each date group 1 is encountered,  row 2 has the same for group 2.

 

The second pass compares the date range of the record-in-hand with the corresponding dates in the matrix corresponding to the complimentary drug group (_XGRP) and sets a flag if there is an overlap.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1794 views
  • 0 likes
  • 4 in conversation