Flag based on specific criteria

SR11 · Posted 04-20-2021 06:29 AM

There are two groups of medications. Group 1 medications are Medication A, B and C and Group 2 consists of Medication XX, XY, YY and YZ.

I want to flag those group of observations (within one unique ID i.e. for each patient) for which Group 1 medications’ Start_date falls between Group 2 medications’ Start_date and End_date. If these criteria meet, then flag 1(in the Group 1 medication observation).

In other words: I want to flag those group of observations (within one unique ID i.e. each patient) for which Group 2 medications’ Start_date occurred before Group 1 medications’ Start_date and Group 2 medications End_date occurred after the Group 1 medications’ Start_date. If these criteria meet, then flag 1(in the Group 1 medication observation).

There may be some other medications as well but we need not to consider those drugs.

Data I have

ID	Med	Start_date	End_date
1	A	04/25/2015	11/20/2015
1	B	03/03/2017	05/10/2017
1	A	03/02/2019	04/02/2019
1	A	10/02/2019	11/20/2019
1	XX	03/10/2015	10/20/2015
1	XY	10/20/2018	11/21/2018
1	FF	12/02/2017	12/15/2017
2	B	03/03/2016	05/03/2016
2	C	06/05/2016	08/04/2016
2	C	09/10/2016	10/11/2016
2	YY	04/10/2016	05/19/2016
2	GG	07/11/2016	07/11/2016
2	YZ	03/01/2016	05/13/2016
2	YZ	07/10/2016	10/10/2016
3	YY	12/11/2017	12/17/2017
3	YZ	01/22/2018	07/12/2018
3	A	02/02/2019	05/20/2020

Data I want:

ID	Med	Start_date	End_date	Flag
1	A	04/25/2015	11/20/2015	1
1	B	03/03/2017	05/10/2017
1	A	03/02/2019	04/02/2019
1	A	10/02/2019	11/20/2019
1	XX	03/10/2015	10/20/2015
1	XY	10/20/2018	11/21/2018
1	FF	12/02/2017	12/15/2017
2	B	03/03/2016	05/03/2016	1
2	C	06/05/2016	08/04/2016
2	C	09/10/2016	10/11/2016	1
2	YY	04/10/2016	05/19/2016
2	GG	07/11/2016	07/11/2016
2	YZ	03/01/2016	05/13/2016
2	YZ	07/10/2016	10/10/2016
3	YY	12/11/2017	12/17/2017
3	YZ	01/22/2018	07/12/2018
3	A	02/02/2019	05/20/2020

Thank you very much for your support.

PeterClemmensen · Posted 04-20-2021 06:50 AM

So if a group1 start date is after some group2 start date and the same group1 end date is before some group2 end date, we should flag that obs, correct? Does it have to apply to the same medication or can it be a start date for say XX and end date for say YY?

The DATA to DATA Step Macro
Blog: SASnrd

SR11 · Posted 04-20-2021 11:18 AM

Hi PeterClemmensen,
No, I correctected the line below-
So if a group1 start date is after some group2 start date and that group2 medication end date is after that group1 start date, we should flag that obs.
It must apply to the same medication.
For example in the data:
(the first number denotes the ID and letter(s) denote the medication)
1A(flagged) start date falls in the 1XX medication start and end date.
2B(flagged) start date falls in the 2YZ medication start and end date.
2C(flagged) start date falls in the 2YZ(the second one) medication start and end date.
Does it make sense?

maguiremq · Posted 04-20-2021 08:25 AM

@SR11 wrote:

There are two groups of medications. Group 1 medications are Medication A, B and C and Group 2 consists of Medication XX, XY, YY and YZ.

I want to flag those group of observations (within one unique ID i.e. for each patient) for which Group 1 medications’ Start_date falls between Group 2 medications’ Start_date and End_date. If these criteria meet, then flag 1(in the Group 1 medication observation).

In other words: I want to flag those group of observations (within one unique ID i.e. each patient) for which Group 2 medications’ Start_date occurred before Group 1 medications’ Start_date and Group 2 medications End_date occurred after the Group 1 medications’ Start_date. If these criteria meet, then flag 1(in the Group 1 medication observation).

There may be some other medications as well but we need not to consider those drugs.

Data I have

ID

Med

Start_date

End_date

1

A

04/25/2015

11/20/2015

1

B

03/03/2017

05/10/2017

1

A

03/02/2019

04/02/2019

1

A

10/02/2019

11/20/2019

1

XX

03/10/2015

10/20/2015

1

XY

10/20/2018

11/21/2018

1

FF

12/02/2017

15/02/2017

2

B

03/03/2016

05/03/2016

2

C

06/05/2016

08/04/2016

2

C

09/10/2016

10/11/2016

2

YY

04/10/2016

05/19/2016

2

GG

07/11/2016

07/11/2016

2

YZ

03/01/2016

05/13/2016

2

YZ

07/10/2016

10/10/2016

3

YY

12/11/2017

12/17/2017

3

YZ

01/22/2018

07/12/2018

3

A

02/02/2019

05/20/2020

Data I want:

ID

Med

Start_date

End_date

Flag

1

A

04/25/2015

11/20/2015

1

1

B

03/03/2017

05/10/2017

1

A

03/02/2019

04/02/2019

1

A

10/02/2019

11/20/2019

1

XX

03/10/2015

10/20/2015

1

XY

10/20/2018

11/21/2018

1

FF

12/02/2017

15/02/2017

2

B

03/03/2016

05/03/2016

1

2

C

06/05/2016

08/04/2016

2

C

09/10/2016

10/11/2016

1

2

YY

04/10/2016

05/19/2016

2

GG

07/11/2016

07/11/2016

2

YZ

03/01/2016

05/13/2016

2

YZ

07/10/2016

10/10/2016

3

YY

12/11/2017

12/17/2017

3

YZ

01/22/2018

07/12/2018

3

A

02/02/2019

05/20/2020

Thank you very much for your support.

What's going on with the bolded rows above? The end date is a different format when compared to the rest.

SR11 · Posted 04-20-2021 11:20 AM

Hi Maguiremq
These rows were bolded automatically while posting. I didn't do it for any reason. I am sorry for that.

Astounding · Posted 04-20-2021 08:52 AM

The basic questions:

Are your dates true SAS dates (numeric values with a format applied), or are they character strings? If you're not sure, run a PROC CONTENTS on your data.
Does "falls between" include or exclude the Group A start and end dates?

Programming is not that difficult but it's important to be solving the actual problem without making assumptions.

SR11 · Posted 04-20-2021 11:28 AM

Hi Astounding,
1. In my real dataset, yes the dates were formatted to dates(MMDDYY10.). The dataset in the post was just made in word doc.
2. The question is very logical. Thank you for the question. Yes 'falls between' includes Group 1 or 2 start and end dates. i.e. if the Group 1 start date is equal to Group 2 start or end date then Flag the observation.

mkeintz · Posted 04-20-2021 01:18 PM

@SR11 wrote (bold italics mine)

@SR11 wrote:
Hi Astounding,
1. In my real dataset, yes the dates were formatted to dates(MMDDYY10.). The dataset in the post was just made in word doc.
2. The question is very logical. Thank you for the question. Yes 'falls between' includes Group 1 or 2 start and end dates. i.e. if the Group 1 start date is equal to Group 2 start or end date then Flag the observation.

"Group 1 or 2 start and end dates" suggests to me that you are look for an overlap or any group 1 date with any group 2 date. But your original post and the following phrase suggests you care about whether a group 1 start_date only overlaps with a group 2 date range.

So the program below assumes your earliest description - i.e. look only for the intersection of group 1 start_date with any group 2 range.

The program passes through each ID twice - the first pass reads only group2 records and updates an array of dates (group2_dates) with a dummy set to one for each group 2 date. The second pass re-reads the ID again (this time all records), and compares the start_date with the group2_dates array.


data want (drop=_:);
  set have (where=(not (med in ('A','B','C'))) in=firstpass)
      have (in=secondpass);
  by id ;
  array group2_dates {%sysevalf("01jan2015"d):%sysevalf("31dec2020"d)}  _temporary_;
  if first.id then call missing(of group2_dates{*});

  if firstpass then do _d=start_date to end_date;
    group2_dates{_d}=1;
  end;
  if secondpass;
  if med in ('A','B','C') and group2_dates{start_date}=1 then flag=1;
run;

I offer no program test, as the data is not provide in the form of a working data step, which presumably would make a note about the apparently invalid date value for end_date (=15/02/2017) in the 7th data record.

This program assumes the data are sorted by ID (but can be any order within a given ID).

The primary feature here is the array. Instead of the array being indexed from (say) 1 to 1000, it is indexed on the underlying values of "01jan2015"d (20089) through "31dec2020"d (22280). Of course, you can change the lower bound and upper bound dates to fit your data.

The other point is that the array is _temporary_, meaning its values are retained from observation to over all records for a given id. So the firstpass populates the array, which is still available for the secondpass.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Astounding · Posted 04-20-2021 03:12 PM

@mkeintz . this is at least as good as what I would have done ... creating an informat based on the Group 2 observations that concatenates ID + date as the input to the informat. Nice!

SR11 · Posted 04-21-2021 05:56 PM

@mkeintz
Hi mkeintz,
Thank you for your reply. You did not define my group 2 drugs. In the data, there are some more drugs other than group 1 and group 2 drugs.
I changed the date in the post. I am sorry for the typo. So, I think you can run and test your code now.

Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Re: Flag based on specific criteria

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away