BookmarkSubscribeRSS Feed
SR11
Obsidian | Level 7

There are two groups of medications. Group 1 medications are Medication A, B and C and Group 2 consists of Medication XX, XY, YY and YZ.

 

I want to flag those group of observations (within one unique ID i.e. for each patient) for which Group 1 medications’ Start_date falls between Group 2 medications’ Start_date and End_date. If these criteria meet, then flag 1(in the Group 1 medication observation).

In other words: I want to flag those group of observations (within one unique ID i.e. each patient) for which Group 2 medications’ Start_date occurred before Group 1 medications’ Start_date and Group 2 medications End_date occurred after the Group 1 medications’ Start_date. If these criteria meet, then flag 1(in the Group 1 medication observation).  

There may be some other medications as well but we need not to consider those drugs.

Data I have

ID

Med

Start_date

End_date

1

A

04/25/2015

11/20/2015

1

B

03/03/2017

05/10/2017

1

A

03/02/2019

04/02/2019

1

A

10/02/2019

11/20/2019

1

XX

03/10/2015

10/20/2015

1

XY

10/20/2018

11/21/2018

1

FF

12/02/2017

12/15/2017

2

B

03/03/2016

05/03/2016

2

C

06/05/2016

08/04/2016

2

C

09/10/2016

10/11/2016

2

YY

04/10/2016

05/19/2016

2

GG

07/11/2016

07/11/2016

2

YZ

03/01/2016

05/13/2016

2

YZ

07/10/2016

10/10/2016

3

YY

12/11/2017

12/17/2017

3

YZ

01/22/2018

07/12/2018

3

A

02/02/2019

05/20/2020

 

Data I want:

ID

Med

Start_date

End_date

Flag

1

A

04/25/2015

11/20/2015

1

1

B

03/03/2017

05/10/2017

 

1

A

03/02/2019

04/02/2019

 

1

A

10/02/2019

11/20/2019

 

1

XX

03/10/2015

10/20/2015

 

1

XY

10/20/2018

11/21/2018

 

1

FF

12/02/2017

12/15/2017

 

2

B

03/03/2016

05/03/2016

1

2

C

06/05/2016

08/04/2016

 

2

C

09/10/2016

10/11/2016

1

2

YY

04/10/2016

05/19/2016

 

2

GG

07/11/2016

07/11/2016

 

2

YZ

03/01/2016

05/13/2016

 

2

YZ

07/10/2016

10/10/2016

 

3

YY

12/11/2017

12/17/2017

 

3

YZ

01/22/2018

07/12/2018

 

3

A

02/02/2019

05/20/2020

 

 

Thank you very much for your support. 

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

So if a group1 start date is after some group2 start date and the same group1 end date is before some group2 end date, we should flag that obs, correct? Does it have to apply to the same medication or can it be a start date for say XX and end date for say YY?

SR11
Obsidian | Level 7
Hi PeterClemmensen,
No, I correctected the line below-
So if a group1 start date is after some group2 start date and that group2 medication end date is after that group1 start date, we should flag that obs.
It must apply to the same medication.
For example in the data:
(the first number denotes the ID and letter(s) denote the medication)
1A(flagged) start date falls in the 1XX medication start and end date.
2B(flagged) start date falls in the 2YZ medication start and end date.
2C(flagged) start date falls in the 2YZ(the second one) medication start and end date.
Does it make sense?

maguiremq
SAS Super FREQ

@SR11 wrote:

There are two groups of medications. Group 1 medications are Medication A, B and C and Group 2 consists of Medication XX, XY, YY and YZ.

 

I want to flag those group of observations (within one unique ID i.e. for each patient) for which Group 1 medications’ Start_date falls between Group 2 medications’ Start_date and End_date. If these criteria meet, then flag 1(in the Group 1 medication observation).

In other words: I want to flag those group of observations (within one unique ID i.e. each patient) for which Group 2 medications’ Start_date occurred before Group 1 medications’ Start_date and Group 2 medications End_date occurred after the Group 1 medications’ Start_date. If these criteria meet, then flag 1(in the Group 1 medication observation).  

There may be some other medications as well but we need not to consider those drugs.

Data I have

ID

Med

Start_date

End_date

1

A

04/25/2015

11/20/2015

1

B

03/03/2017

05/10/2017

1

A

03/02/2019

04/02/2019

1

A

10/02/2019

11/20/2019

1

XX

03/10/2015

10/20/2015

1

XY

10/20/2018

11/21/2018

1

FF

12/02/2017

15/02/2017

2

B

03/03/2016

05/03/2016

2

C

06/05/2016

08/04/2016

2

C

09/10/2016

10/11/2016

2

YY

04/10/2016

05/19/2016

2

GG

07/11/2016

07/11/2016

2

YZ

03/01/2016

05/13/2016

2

YZ

07/10/2016

10/10/2016

3

YY

12/11/2017

12/17/2017

3

YZ

01/22/2018

07/12/2018

3

A

02/02/2019

05/20/2020

 

Data I want:

ID

Med

Start_date

End_date

Flag

1

A

04/25/2015

11/20/2015

1

1

B

03/03/2017

05/10/2017

 

1

A

03/02/2019

04/02/2019

 

1

A

10/02/2019

11/20/2019

 

1

XX

03/10/2015

10/20/2015

 

1

XY

10/20/2018

11/21/2018

 

1

FF

12/02/2017

15/02/2017

 

2

B

03/03/2016

05/03/2016

1

2

C

06/05/2016

08/04/2016

 

2

C

09/10/2016

10/11/2016

1

2

YY

04/10/2016

05/19/2016

 

2

GG

07/11/2016

07/11/2016

 

2

YZ

03/01/2016

05/13/2016

 

2

YZ

07/10/2016

10/10/2016

 

3

YY

12/11/2017

12/17/2017

 

3

YZ

01/22/2018

07/12/2018

 

3

A

02/02/2019

05/20/2020

 

 

Thank you very much for your support. 


What's going on with the bolded rows above? The end date is a different format when compared to the rest.

SR11
Obsidian | Level 7
Hi Maguiremq
These rows were bolded automatically while posting. I didn't do it for any reason. I am sorry for that.
Astounding
PROC Star

The basic questions:

 

  1. Are your dates true SAS dates (numeric values with a format applied), or are they character strings?  If you're not sure, run a PROC CONTENTS on  your data.
  2. Does "falls between" include or exclude the Group A start and end dates?

Programming is not that difficult but it's important to be solving the actual problem without making assumptions.

SR11
Obsidian | Level 7
Hi Astounding,
1. In my real dataset, yes the dates were formatted to dates(MMDDYY10.). The dataset in the post was just made in word doc.
2. The question is very logical. Thank you for the question. Yes 'falls between' includes Group 1 or 2 start and end dates. i.e. if the Group 1 start date is equal to Group 2 start or end date then Flag the observation.


mkeintz
PROC Star

@SR11 wrote   (bold italics mine)

 


@SR11 wrote:
Hi Astounding,
1. In my real dataset, yes the dates were formatted to dates(MMDDYY10.). The dataset in the post was just made in word doc.
2. The question is very logical. Thank you for the question. Yes 'falls between' includes Group 1 or 2 start and end dates. i.e. if the Group 1 start date is equal to Group 2 start or end date then Flag the observation.

"Group 1 or 2 start and end dates" suggests to me that you are look for an overlap or any group 1 date with any group 2 date.  But your original post and the following phrase suggests you care about whether a group 1 start_date only overlaps with a group 2 date range.

 

So the program below assumes your earliest description -  i.e. look only for the intersection of group 1 start_date  with any group 2 range.

 

The program passes through each ID twice - the first pass reads only group2 records and updates an array of dates (group2_dates) with a dummy set to one for each group 2 date.  The second pass re-reads the ID again (this time all records), and compares the start_date with the group2_dates array.

 


data want (drop=_:);
  set have (where=(not (med in ('A','B','C'))) in=firstpass)
      have (in=secondpass);
  by id ;
  array group2_dates {%sysevalf("01jan2015"d):%sysevalf("31dec2020"d)}  _temporary_;
  if first.id then call missing(of group2_dates{*});

  if firstpass then do _d=start_date to end_date;
    group2_dates{_d}=1;
  end;
  if secondpass;
  if med in ('A','B','C') and group2_dates{start_date}=1 then flag=1;
run;

I offer no program test, as the data is not provide in the form of a working data step, which presumably would make a note about the apparently invalid date value for end_date (=15/02/2017) in the 7th data record.

 

This program assumes the data are sorted by ID (but can be any order within a given ID).


The primary feature here is the array.  Instead of the array being indexed from (say) 1 to 1000, it is indexed on the underlying values of "01jan2015"d (20089) through "31dec2020"d (22280).  Of course, you can change the lower bound and upper bound dates to fit your data.

 

The other point is that the array is _temporary_,  meaning its values are retained from observation to over all records for a given id.  So the firstpass populates the array, which is still available for the secondpass.


--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

@mkeintz . this is at least as good as what I would have done ... creating an informat based on the Group 2 observations that concatenates ID + date as the input to the informat.  Nice!

SR11
Obsidian | Level 7
@mkeintz
Hi mkeintz,
Thank you for your reply. You did not define my group 2 drugs. In the data, there are some more drugs other than group 1 and group 2 drugs.
I changed the date in the post. I am sorry for the typo. So, I think you can run and test your code now.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1107 views
  • 1 like
  • 5 in conversation