Solved: Flag those group of observations which have overlaps between two group...

SR11 · Posted 05-31-2020 07:24 PM

Hi,

I want to flag those group of observations(within one unique ID) which have overlaps(at least for one day)between two groups of drugs. Group 1 drugs are Drug A, B and C and Group 2 consists of Drug XX, XY, YY and Drug YZ.

So, for example, in one unique ID if there is at least one day overlap between drug A and drug XX then Flag=1 in both observations. If no overlap, then Flag=0. If there is any overlap between Drug A and Drug B then still Flag=0, because I want to see the overlaps between two groups of drugs.

Thanks

Data- I have

ID	Med	Start_date	End_date
1	A	05/25/2017	10/30/2017
1	B	05/26/2017	06/03/2017
1	C	10/30/2017	10/30/2019
1	XX	01/02/2020	02/03/2020
2	A	03/04/2015	04/25/2015
2	XY	04/23/2015	04/22/2016
2	YY	04/22/2016	04/21/2017
3	XY	02/22/2019	03/21/2019
3	A	03/18/2019	05/15/2019
3	B	05/16/2019	02/01/2020
3	YY	01/20/2017	06/30/2017
3	YZ	01/01/2020	05/05/2020

Data- I want

ID	Med	Start_date	End_date	Flag
1	A	05/25/2017	10/30/2017	0
1	B	05/26/2017	06/03/2017	0
1	C	10/30/2017	10/30/2019	0
1	XX	01/02/2020	02/03/2020	0
2	A	03/04/2015	04/25/2015	1
2	XY	04/23/2015	04/22/2016	1
2	YY	04/22/2016	04/21/2017	0
3	XY	02/22/2019	03/27/2019	1
3	A	03/18/2019	03/25/2019	1
3	B	05/16/2019	02/01/2020	1
3	YY	01/20/2017	06/30/2017	0
3	YZ	01/01/2020	05/05/2020	1

PGStats · Posted 05-31-2020 11:53 PM

Try this:

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

proc sql;
/* create table want as */
select *,
    case 
        when med in ("A", "B", "C") then 
            exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and 
                End_date >= a.Start_Date and Start_Date <= a.End_date )
        when med in ("XX", "XY", "YY", "YZ") then 
            exists (select * from have where ID=a.ID and med in ("A", "B", "C") and 
                End_date >= a.Start_Date and Start_Date <= a.End_date )
        else 0 end as Flag
from have as a;
quit;

PG

View solution in original post

PGStats · Posted 05-31-2020 11:53 PM

Try this:

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

proc sql;
/* create table want as */
select *,
    case 
        when med in ("A", "B", "C") then 
            exists (select * from have where ID=a.ID and med in ("XX", "XY", "YY", "YZ") and 
                End_date >= a.Start_Date and Start_Date <= a.End_date )
        when med in ("XX", "XY", "YY", "YZ") then 
            exists (select * from have where ID=a.ID and med in ("A", "B", "C") and 
                End_date >= a.Start_Date and Start_Date <= a.End_date )
        else 0 end as Flag
from have as a;
quit;

PG

SR11 · Posted 06-02-2020 04:37 PM

Thank you very much for this solution.

SASKiwi · Posted 06-01-2020 12:54 AM

Here is another SQL variation:

data have;
format Start_date End_date mmddyy10.;
input ID $ Med $ MedGroup $ Start_date : mmddyy10. End_date : mmddyy10.;
datalines;
1 A  1 05/25/2017 10/30/2017
1 B  1 05/26/2017 06/03/2017
1 C  1 10/30/2017 10/30/2019
1 XX 2 01/02/2020 02/03/2020
2 A  1 03/04/2015 04/25/2015
2 XY 2 04/23/2015 04/22/2016
2 YY 2 04/22/2016 04/21/2017
3 XY 2 02/22/2019 03/21/2019
3 A  1 03/18/2019 05/15/2019
3 B  1 05/16/2019 02/01/2020
3 YY 2 01/20/2017 06/30/2017
3 YZ 2 01/01/2020 05/05/2020
;
run;

proc sql;
  create table want as
  select A.*
         ,B1.Min_Start_date1
         ,B1.Max_End_date1
         ,B2.Min_Start_date2
         ,B2.Max_End_date2
         ,case
            when (MedGroup = '1' and (A.Start_Date <= Max_End_date2 and A.End_date >= Min_Start_date2)) then 1 
            when (MedGroup = '2' and (A.Start_Date <= Max_End_date1 and A.End_date >= Min_Start_date1)) then 1 
            else 0
          end as Flag 

  from have as A
  left join
  (select ID 
         ,min(Start_date) as Min_Start_date1 format = mmddyy10.
         ,max(End_date ) as Max_End_date1 format = mmddyy10.
  from have
  where MedGroup = '1'
  group by ID
  ) as B1
  on A.ID = B1.ID

  left join
  (select ID
         ,min(Start_date) as Min_Start_date2 format = mmddyy10.
         ,max(End_date ) as Max_End_date2 format = mmddyy10.
  from have
  where MedGroup = '2'
  group by ID
  ) as B2
  on A.ID = B2.ID
  ;
quit;

SR11 · Posted 06-02-2020 04:38 PM

Thank you very much for this solution!

mkeintz · Posted 06-02-2020 08:21 PM

Give that the data are sorted by ID, there may be a performance advantage in using a data step with a BY ID construct:

data have;
input ID    Med $ (Start_date  End_date) (:mmddyy10.);
format Start_date  End_date yymmdd10.;
datalines;
1 A   05/25/2017  10/30/2017
1 B   05/26/2017  06/03/2017
1 C   10/30/2017  10/30/2019
1 XX  01/02/2020  02/03/2020
2 A   03/04/2015  04/25/2015
2 XY  04/23/2015  04/22/2016
2 YY  04/22/2016  04/21/2017
3 XY  02/22/2019  03/21/2019
3 A   03/18/2019  05/15/2019
3 B   05/16/2019  02/01/2020
3 YY  01/20/2017  06/30/2017
3 YZ  01/01/2020  05/05/2020
;

%let begdate=01jan2015;
%let enddate=31dec2020;
data want (drop=_:);
  array med_range {1:2,%sysevalf("&begdate"d):%sysevalf("&enddate"d)} _temporary_ ;

  set have (in=first_pass)   have (in=final_pass);
  by id;
  if first.id then call missing(of med_range{*});
  if med in ('A','B','C') then _grp=1;
  else _grp=2;

  if first_pass then do _d=start_date to end_date;
    med_range{_grp,_d}=1; 
  end;

  if final_pass;
  flag=0;
  _xgrp=3-_grp;  /*_grp=2 maps to _xgrp=1 and 1 maps to 2*/
  do _d=start_date to end_date until (flag=1);
    if med_range{_xgrp,_d}=1 then flag=1;
  end; 
run;

Each by group is processed twice. The first pass sets up a 2-row matrix corresponding to the date range universe of your data. Row 1 (_GRP=1) gets assigned 1's for each date group 1 is encountered, row 2 has the same for group 2.

The second pass compares the date range of the record-in-hand with the corresponding dates in the matrix corresponding to the complimentary drug group (_XGRP) and sets a flag if there is an overlap.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Re: Flag those group of observations which have overlaps between two groups

Registration is open

SAS Training: Just a Click Away