Hi,
I want to flag those group of observations(within one unique ID) which have overlaps(at least for one day)between two groups of drugs. Group 1 drugs are Drug A, B and C and Group 2 consists of Drug XX, XY, YY and Drug YZ.
So, for example, in one unique ID if there is at least one day overlap between drug A and drug XX then Flag=1 in both observations. If no overlap, then Flag=0. If there is any overlap between Drug A and Drug B then still Flag=0, because I want to see the overlaps between two groups of drugs.
Thanks
Data- I have
ID | Med | Start_date | End_date |
1 | A | 05/25/2017 | 10/30/2017 |
1 | B | 05/26/2017 | 06/03/2017 |
1 | C | 10/30/2017 | 10/30/2019 |
1 | XX | 01/02/2020 | 02/03/2020 |
2 | A | 03/04/2015 | 04/25/2015 |
2 | XY | 04/23/2015 | 04/22/2016 |
2 | YY | 04/22/2016 | 04/21/2017 |
3 | XY | 02/22/2019 | 03/21/2019 |
3 | A | 03/18/2019 | 05/15/2019 |
3 | B | 05/16/2019 | 02/01/2020 |
3 | YY | 01/20/2017 | 06/30/2017 |
3 | YZ | 01/01/2020 | 05/05/2020 |
Data- I want
ID | Med | Start_date | End_date | Flag |
1 | A | 05/25/2017 | 10/30/2017 | 0 |
1 | B | 05/26/2017 | 06/03/2017 | 0 |
1 | C | 10/30/2017 | 10/30/2019 | 0 |
1 | XX | 01/02/2020 | 02/03/2020 | 0 |
2 | A | 03/04/2015 | 04/25/2015 | 1 |
2 | XY | 04/23/2015 | 04/22/2016 | 1 |
2 | YY | 04/22/2016 | 04/21/2017 | 0 |
3 | XY | 02/22/2019 | 03/27/2019 | 1 |
3 | A | 03/18/2019 | 03/25/2019 | 1 |
3 | B | 05/16/2019 | 02/01/2020 | 1 |
3 | YY | 01/20/2017 | 06/30/2017 | 0 |
3 | YZ | 01/01/2020 | 05/05/2020 | 1 |
Try this:
data have;
input ID Med $ (Start_date End_date) (:mmddyy10.);
format Start_date End_date yymmdd10.;
datalines;
1 A 05/25/2017 10/30/2017
1 B 05/26/2017 06/03/2017
1 C 10/30/2017 10/30/2019
1 XX 01/02/2020 02/03/2020
2 A 03/04/2015 04/25/2015
2 XY 04/23/2015 04/22/2016
2 YY 04/22/2016 04/21/2017
3 XY 02/22/2019 03/21/2019
3 A 03/18/2019 05/15/2019
3 B 05/16/2019 02/01/2020
3 YY 01/20/2017 06/30/2017
3 YZ 01/01/2020 05/05/2020
;
proc sql;
/* create table want as */
select *,
case
when med in ("A", "B", "C") then
exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
when med in ("XX", "XY", "YY", "YZ") then
exists (select * from have where ID=a.ID and med in ("A", "B", "C") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
else 0 end as Flag
from have as a;
quit;
Try this:
data have;
input ID Med $ (Start_date End_date) (:mmddyy10.);
format Start_date End_date yymmdd10.;
datalines;
1 A 05/25/2017 10/30/2017
1 B 05/26/2017 06/03/2017
1 C 10/30/2017 10/30/2019
1 XX 01/02/2020 02/03/2020
2 A 03/04/2015 04/25/2015
2 XY 04/23/2015 04/22/2016
2 YY 04/22/2016 04/21/2017
3 XY 02/22/2019 03/21/2019
3 A 03/18/2019 05/15/2019
3 B 05/16/2019 02/01/2020
3 YY 01/20/2017 06/30/2017
3 YZ 01/01/2020 05/05/2020
;
proc sql;
/* create table want as */
select *,
case
when med in ("A", "B", "C") then
exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
when med in ("XX", "XY", "YY", "YZ") then
exists (select * from have where ID=a.ID and med in ("A", "B", "C") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
else 0 end as Flag
from have as a;
quit;
Here is another SQL variation:
data have;
format Start_date End_date mmddyy10.;
input ID $ Med $ MedGroup $ Start_date : mmddyy10. End_date : mmddyy10.;
datalines;
1 A 1 05/25/2017 10/30/2017
1 B 1 05/26/2017 06/03/2017
1 C 1 10/30/2017 10/30/2019
1 XX 2 01/02/2020 02/03/2020
2 A 1 03/04/2015 04/25/2015
2 XY 2 04/23/2015 04/22/2016
2 YY 2 04/22/2016 04/21/2017
3 XY 2 02/22/2019 03/21/2019
3 A 1 03/18/2019 05/15/2019
3 B 1 05/16/2019 02/01/2020
3 YY 2 01/20/2017 06/30/2017
3 YZ 2 01/01/2020 05/05/2020
;
run;
proc sql;
create table want as
select A.*
,B1.Min_Start_date1
,B1.Max_End_date1
,B2.Min_Start_date2
,B2.Max_End_date2
,case
when (MedGroup = '1' and (A.Start_Date <= Max_End_date2 and A.End_date >= Min_Start_date2)) then 1
when (MedGroup = '2' and (A.Start_Date <= Max_End_date1 and A.End_date >= Min_Start_date1)) then 1
else 0
end as Flag
from have as A
left join
(select ID
,min(Start_date) as Min_Start_date1 format = mmddyy10.
,max(End_date ) as Max_End_date1 format = mmddyy10.
from have
where MedGroup = '1'
group by ID
) as B1
on A.ID = B1.ID
left join
(select ID
,min(Start_date) as Min_Start_date2 format = mmddyy10.
,max(End_date ) as Max_End_date2 format = mmddyy10.
from have
where MedGroup = '2'
group by ID
) as B2
on A.ID = B2.ID
;
quit;
Give that the data are sorted by ID, there may be a performance advantage in using a data step with a BY ID construct:
data have;
input ID Med $ (Start_date End_date) (:mmddyy10.);
format Start_date End_date yymmdd10.;
datalines;
1 A 05/25/2017 10/30/2017
1 B 05/26/2017 06/03/2017
1 C 10/30/2017 10/30/2019
1 XX 01/02/2020 02/03/2020
2 A 03/04/2015 04/25/2015
2 XY 04/23/2015 04/22/2016
2 YY 04/22/2016 04/21/2017
3 XY 02/22/2019 03/21/2019
3 A 03/18/2019 05/15/2019
3 B 05/16/2019 02/01/2020
3 YY 01/20/2017 06/30/2017
3 YZ 01/01/2020 05/05/2020
;
%let begdate=01jan2015;
%let enddate=31dec2020;
data want (drop=_:);
array med_range {1:2,%sysevalf("&begdate"d):%sysevalf("&enddate"d)} _temporary_ ;
set have (in=first_pass) have (in=final_pass);
by id;
if first.id then call missing(of med_range{*});
if med in ('A','B','C') then _grp=1;
else _grp=2;
if first_pass then do _d=start_date to end_date;
med_range{_grp,_d}=1;
end;
if final_pass;
flag=0;
_xgrp=3-_grp; /*_grp=2 maps to _xgrp=1 and 1 maps to 2*/
do _d=start_date to end_date until (flag=1);
if med_range{_xgrp,_d}=1 then flag=1;
end;
run;
Each by group is processed twice. The first pass sets up a 2-row matrix corresponding to the date range universe of your data. Row 1 (_GRP=1) gets assigned 1's for each date group 1 is encountered, row 2 has the same for group 2.
The second pass compares the date range of the record-in-hand with the corresponding dates in the matrix corresponding to the complimentary drug group (_XGRP) and sets a flag if there is an overlap.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.