BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SR11
Obsidian | Level 7

Hi,

I want to flag those group of observations(within one unique ID) which have overlaps(at least for one day)between two groups of drugs. Group 1 drugs are Drug A, B and C and Group 2 consists of Drug XX, XY, YY and Drug YZ.

So, for example, in one unique ID if there is at least one day overlap between drug A and drug XX then Flag=1 in both observations. If no overlap, then Flag=0. If there is any overlap between Drug A and Drug B then still Flag=0, because I want to see the overlaps between two groups of drugs.  

Thanks

Data- I have

ID

Med

Start_date

End_date

1

A

05/25/2017

10/30/2017

1

B

05/26/2017

06/03/2017

1

C

10/30/2017

10/30/2019

1

XX

01/02/2020

02/03/2020

2

A

03/04/2015

04/25/2015

2

XY

04/23/2015

04/22/2016

2

YY

04/22/2016

04/21/2017

3

XY

02/22/2019

03/21/2019

3

A

03/18/2019

05/15/2019

3

B

05/16/2019

02/01/2020

3

YY

01/20/2017

06/30/2017

3

YZ

01/01/2020

05/05/2020

 

Data- I want

ID

Med

Start_date

End_date

Flag

1

A

05/25/2017

10/30/2017

0

1

B

05/26/2017

06/03/2017

0

1

C

10/30/2017

10/30/2019

0

1

XX

01/02/2020

02/03/2020

0

2

A

03/04/2015

04/25/2015

1

2

XY

04/23/2015

04/22/2016

1

2

YY

04/22/2016

04/21/2017

0

3

XY

02/22/2019

03/27/2019

1

3

A

03/18/2019

03/25/2019

1

3

B

05/16/2019

02/01/2020

1

3

YY

01/20/2017

06/30/2017

0

3

YZ

01/01/2020

05/05/2020

1

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try this:

 

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

proc sql;
/* create table want as */ select *, case when med in ("A", "B", "C") then exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and End_date >= a.Start_Date and Start_Date <= a.End_date ) when med in ("XX", "XY", "YY", "YZ") then exists (select * from have where ID=a.ID and med in ("A", "B", "C") and End_date >= a.Start_Date and Start_Date <= a.End_date ) else 0 end as Flag from have as a; quit;
PG

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

Try this:

 

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

proc sql;
/* create table want as */ select *, case when med in ("A", "B", "C") then exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and End_date >= a.Start_Date and Start_Date <= a.End_date ) when med in ("XX", "XY", "YY", "YZ") then exists (select * from have where ID=a.ID and med in ("A", "B", "C") and End_date >= a.Start_Date and Start_Date <= a.End_date ) else 0 end as Flag from have as a; quit;
PG
SR11
Obsidian | Level 7
Thank you very much for this solution.
SASKiwi
PROC Star

Here is another SQL variation:

data have;
format Start_date End_date mmddyy10.;
input ID $ Med $ MedGroup $ Start_date : mmddyy10. End_date : mmddyy10.;
datalines;
1 A  1 05/25/2017 10/30/2017
1 B  1 05/26/2017 06/03/2017
1 C  1 10/30/2017 10/30/2019
1 XX 2 01/02/2020 02/03/2020
2 A  1 03/04/2015 04/25/2015
2 XY 2 04/23/2015 04/22/2016
2 YY 2 04/22/2016 04/21/2017
3 XY 2 02/22/2019 03/21/2019
3 A  1 03/18/2019 05/15/2019
3 B  1 05/16/2019 02/01/2020
3 YY 2 01/20/2017 06/30/2017
3 YZ 2 01/01/2020 05/05/2020
;
run;

proc sql;
  create table want as
  select A.*
         ,B1.Min_Start_date1
         ,B1.Max_End_date1
         ,B2.Min_Start_date2
         ,B2.Max_End_date2
         ,case
            when (MedGroup = '1' and (A.Start_Date <= Max_End_date2 and A.End_date >= Min_Start_date2)) then 1 
            when (MedGroup = '2' and (A.Start_Date <= Max_End_date1 and A.End_date >= Min_Start_date1)) then 1 
            else 0
          end as Flag 

  from have as A
  left join
  (select ID 
         ,min(Start_date) as Min_Start_date1 format = mmddyy10.
         ,max(End_date ) as Max_End_date1 format = mmddyy10.
  from have
  where MedGroup = '1'
  group by ID
  ) as B1
  on A.ID = B1.ID

  left join
  (select ID
         ,min(Start_date) as Min_Start_date2 format = mmddyy10.
         ,max(End_date ) as Max_End_date2 format = mmddyy10.
  from have
  where MedGroup = '2'
  group by ID
  ) as B2
  on A.ID = B2.ID
  ;
quit;
SR11
Obsidian | Level 7
Thank you very much for this solution!
mkeintz
PROC Star

Give that the data are sorted by ID, there may be a performance advantage in using a data step with a BY ID construct:

 

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

%let begdate=01jan2015;
%let enddate=31dec2020;
data want (drop=_:);
  array med_range {1:2,%sysevalf("&begdate"d):%sysevalf("&enddate"d)} _temporary_ ;

  set have (in=first_pass)   have (in=final_pass);
  by id;
  if first.id then call missing(of med_range{*});
  if med in ('A','B','C') then _grp=1;
  else _grp=2;

  if first_pass then do _d=start_date to end_date;
    med_range{_grp,_d}=1; 
  end;

  if final_pass;
  flag=0;
  _xgrp=3-_grp;  /*_grp=2 maps to _xgrp=1 and 1 maps to 2*/
  do _d=start_date to end_date until (flag=1);
    if med_range{_xgrp,_d}=1 then flag=1;
  end; 
run;

Each by group is processed twice.  The first pass sets up a 2-row matrix corresponding to the date range universe of your data.  Row 1 (_GRP=1) gets assigned 1's for each date group 1 is encountered,  row 2 has the same for group 2.

 

The second pass compares the date range of the record-in-hand with the corresponding dates in the matrix corresponding to the complimentary drug group (_XGRP) and sets a flag if there is an overlap.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1278 views
  • 0 likes
  • 4 in conversation