Hi,
I want to flag those group of observations(within one unique ID) which have overlaps(at least for one day)between two groups of drugs. Group 1 drugs are Drug A, B and C and Group 2 consists of Drug XX, XY, YY and Drug YZ.
So, for example, in one unique ID if there is at least one day overlap between drug A and drug XX then Flag=1 in both observations. If no overlap, then Flag=0. If there is any overlap between Drug A and Drug B then still Flag=0, because I want to see the overlaps between two groups of drugs.
Thanks
Data- I have
ID | Med | Start_date | End_date |
1 | A | 05/25/2017 | 10/30/2017 |
1 | B | 05/26/2017 | 06/03/2017 |
1 | C | 10/30/2017 | 10/30/2019 |
1 | XX | 01/02/2020 | 02/03/2020 |
2 | A | 03/04/2015 | 04/25/2015 |
2 | XY | 04/23/2015 | 04/22/2016 |
2 | YY | 04/22/2016 | 04/21/2017 |
3 | XY | 02/22/2019 | 03/21/2019 |
3 | A | 03/18/2019 | 05/15/2019 |
3 | B | 05/16/2019 | 02/01/2020 |
3 | YY | 01/20/2017 | 06/30/2017 |
3 | YZ | 01/01/2020 | 05/05/2020 |
Data- I want
ID | Med | Start_date | End_date | Flag |
1 | A | 05/25/2017 | 10/30/2017 | 0 |
1 | B | 05/26/2017 | 06/03/2017 | 0 |
1 | C | 10/30/2017 | 10/30/2019 | 0 |
1 | XX | 01/02/2020 | 02/03/2020 | 0 |
2 | A | 03/04/2015 | 04/25/2015 | 1 |
2 | XY | 04/23/2015 | 04/22/2016 | 1 |
2 | YY | 04/22/2016 | 04/21/2017 | 0 |
3 | XY | 02/22/2019 | 03/27/2019 | 1 |
3 | A | 03/18/2019 | 03/25/2019 | 1 |
3 | B | 05/16/2019 | 02/01/2020 | 1 |
3 | YY | 01/20/2017 | 06/30/2017 | 0 |
3 | YZ | 01/01/2020 | 05/05/2020 | 1 |
Try this:
data have;
input ID Med $ (Start_date End_date) (:mmddyy10.);
format Start_date End_date yymmdd10.;
datalines;
1 A 05/25/2017 10/30/2017
1 B 05/26/2017 06/03/2017
1 C 10/30/2017 10/30/2019
1 XX 01/02/2020 02/03/2020
2 A 03/04/2015 04/25/2015
2 XY 04/23/2015 04/22/2016
2 YY 04/22/2016 04/21/2017
3 XY 02/22/2019 03/21/2019
3 A 03/18/2019 05/15/2019
3 B 05/16/2019 02/01/2020
3 YY 01/20/2017 06/30/2017
3 YZ 01/01/2020 05/05/2020
;
proc sql;
/* create table want as */
select *,
case
when med in ("A", "B", "C") then
exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
when med in ("XX", "XY", "YY", "YZ") then
exists (select * from have where ID=a.ID and med in ("A", "B", "C") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
else 0 end as Flag
from have as a;
quit;
Try this:
data have;
input ID Med $ (Start_date End_date) (:mmddyy10.);
format Start_date End_date yymmdd10.;
datalines;
1 A 05/25/2017 10/30/2017
1 B 05/26/2017 06/03/2017
1 C 10/30/2017 10/30/2019
1 XX 01/02/2020 02/03/2020
2 A 03/04/2015 04/25/2015
2 XY 04/23/2015 04/22/2016
2 YY 04/22/2016 04/21/2017
3 XY 02/22/2019 03/21/2019
3 A 03/18/2019 05/15/2019
3 B 05/16/2019 02/01/2020
3 YY 01/20/2017 06/30/2017
3 YZ 01/01/2020 05/05/2020
;
proc sql;
/* create table want as */
select *,
case
when med in ("A", "B", "C") then
exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
when med in ("XX", "XY", "YY", "YZ") then
exists (select * from have where ID=a.ID and med in ("A", "B", "C") and
End_date >= a.Start_Date and Start_Date <= a.End_date )
else 0 end as Flag
from have as a;
quit;
Here is another SQL variation:
data have;
format Start_date End_date mmddyy10.;
input ID $ Med $ MedGroup $ Start_date : mmddyy10. End_date : mmddyy10.;
datalines;
1 A 1 05/25/2017 10/30/2017
1 B 1 05/26/2017 06/03/2017
1 C 1 10/30/2017 10/30/2019
1 XX 2 01/02/2020 02/03/2020
2 A 1 03/04/2015 04/25/2015
2 XY 2 04/23/2015 04/22/2016
2 YY 2 04/22/2016 04/21/2017
3 XY 2 02/22/2019 03/21/2019
3 A 1 03/18/2019 05/15/2019
3 B 1 05/16/2019 02/01/2020
3 YY 2 01/20/2017 06/30/2017
3 YZ 2 01/01/2020 05/05/2020
;
run;
proc sql;
create table want as
select A.*
,B1.Min_Start_date1
,B1.Max_End_date1
,B2.Min_Start_date2
,B2.Max_End_date2
,case
when (MedGroup = '1' and (A.Start_Date <= Max_End_date2 and A.End_date >= Min_Start_date2)) then 1
when (MedGroup = '2' and (A.Start_Date <= Max_End_date1 and A.End_date >= Min_Start_date1)) then 1
else 0
end as Flag
from have as A
left join
(select ID
,min(Start_date) as Min_Start_date1 format = mmddyy10.
,max(End_date ) as Max_End_date1 format = mmddyy10.
from have
where MedGroup = '1'
group by ID
) as B1
on A.ID = B1.ID
left join
(select ID
,min(Start_date) as Min_Start_date2 format = mmddyy10.
,max(End_date ) as Max_End_date2 format = mmddyy10.
from have
where MedGroup = '2'
group by ID
) as B2
on A.ID = B2.ID
;
quit;
Give that the data are sorted by ID, there may be a performance advantage in using a data step with a BY ID construct:
data have;
input ID Med $ (Start_date End_date) (:mmddyy10.);
format Start_date End_date yymmdd10.;
datalines;
1 A 05/25/2017 10/30/2017
1 B 05/26/2017 06/03/2017
1 C 10/30/2017 10/30/2019
1 XX 01/02/2020 02/03/2020
2 A 03/04/2015 04/25/2015
2 XY 04/23/2015 04/22/2016
2 YY 04/22/2016 04/21/2017
3 XY 02/22/2019 03/21/2019
3 A 03/18/2019 05/15/2019
3 B 05/16/2019 02/01/2020
3 YY 01/20/2017 06/30/2017
3 YZ 01/01/2020 05/05/2020
;
%let begdate=01jan2015;
%let enddate=31dec2020;
data want (drop=_:);
array med_range {1:2,%sysevalf("&begdate"d):%sysevalf("&enddate"d)} _temporary_ ;
set have (in=first_pass) have (in=final_pass);
by id;
if first.id then call missing(of med_range{*});
if med in ('A','B','C') then _grp=1;
else _grp=2;
if first_pass then do _d=start_date to end_date;
med_range{_grp,_d}=1;
end;
if final_pass;
flag=0;
_xgrp=3-_grp; /*_grp=2 maps to _xgrp=1 and 1 maps to 2*/
do _d=start_date to end_date until (flag=1);
if med_range{_xgrp,_d}=1 then flag=1;
end;
run;
Each by group is processed twice. The first pass sets up a 2-row matrix corresponding to the date range universe of your data. Row 1 (_GRP=1) gets assigned 1's for each date group 1 is encountered, row 2 has the same for group 2.
The second pass compares the date range of the record-in-hand with the corresponding dates in the matrix corresponding to the complimentary drug group (_XGRP) and sets a flag if there is an overlap.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.