SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How do I combine rows based on whether the dates are within a certain time period from each other?

Reply
Contributor
Posts: 51

How do I combine rows based on whether the dates are within a certain time period from each other?

I have a database of subjects, medications that were prescribed, and the dates they were prescribed. I would like to assign medication regimens to each subject by looking at which of these medications were used over the same time period. A subject can have more than one regimen; what determines the regimen are the types of medications and when they were being taken (Script_EnteredDate = date the medication was prescribed).

 

I was able to combine rows that had the same dates, look at their medication combinations, and assign regimens accordingly (my code for this is very long, so I have put it at the very bottom of this post for reference). The problem is that a lot of regimens are still missing.

 

Example of the data is below:

 

Obs Script_EnteredDate SUBJECTID sum_INT  sum_nNRTI  sum_PI  sum_Ritonavir  sum_Other  sum_NRTI  regimen

1 07/10/2007 454 0 0 2 0 0 1 .

2 04/05/2008 454 0 0 2 0 0 1 .

3 07/25/2012 455 0 1 0 0 0 1 .

4 12/12/2006 455 0 1 0 0 0 0 .

5 01/26/2007 455 0 1 0 0 0 0 .

6 01/27/2007 455 0 0 0 0 0 1 NRTI-based

7 05/06/2010 455 0 0 0 0 0 3 NRTI-based

8 05/12/2010 455 0 1 0 0 0 1 .

9 05/13/2010 455 0 0 0 0 0 1 NRTI-based

10 08/04/2010 460 0 1 0 0 0 0 .

11 08/11/2010 460 0 1 0 0 0 1 .

12 08/12/2010 460 0 0 0 0 0 1 NRTI-based

13 01/10/2007 470 0 0 1 0 0 0 .

14 10/11/2007 470 0 0 1 0 0 0 .

15 10/16/2007 471 0 0 1 0 0 1 .

16 10/26/2007 471 0 0 2 0 0 1 .

17 05/28/2008 471 0 0 2 0 0 1 .

18 11/28/2006 472 0 0 1 0 0 0 .

19 11/29/2006 472 0 0 0 0 0 1 NRTI-based

 

It is likely that there are a few subjects whose medication combination does not fit one of my regimen criteria. HOWEVER, my theory is that in some cases, the medications were prescribed close together but not necessarily on the same day. This would mean that there are a lot of single medications that are not being assigned to a regimen (because my code was written to only count medications as part of a row/regimen if they were prescribed on exactly the same day), even though those medications were taken during the same time period (give or take a few weeks, or sometimes months). I would like to be able to combine (specifically, sum) some of the rows together if their dates are close to each other, for example only two weeks apart. For example, the last two observations (subjectid 472) would below in the same row because their prescription dates are only one day apart:

18 11/28/2006 472 0 0 1 0 0 0 .

19 11/29/2006 472 0 0 0 0 0 1 NRTI-based

 

I would like these two rows to turn into something like this:

18 11/29/2006 472 0 0 1 0 0 1 NRTI-based*  

(*Note: This "NRTI-based" regimen assignment could change to another regimen depending on what medications end up being combined into this row. But I already have a code to do this assignment for each row, assuming the rows are produced correctly.)

 

Any help would be much appreciated! Thank you in advance.

 

CODE I USED PREVIOUSLY TO COMBINE ROWS WITH SAME DATES:

 

proc sql;

    create table WANT1 as select

    Script_EnteredDate,

    SUBJECTID,

    sum(INT) as sum_INT

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

 

proc sql;

    create table WANT2 as select

    Script_EnteredDate,

   SUBJECTID,

sum(mNRTI) as sum_mNRTI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT3 as select

    Script_EnteredDate,

   SUBJECTID,

sum(NRTI) as sum_NRTI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT4 as select

    Script_EnteredDate,

   SUBJECTID,

sum(nNRTI) as sum_nNRTI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT5 as select

    Script_EnteredDate,

   SUBJECTID,

sum(PI) as sum_PI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT6 as select

    Script_EnteredDate,

   SUBJECTID,

sum(Ritonavir) as sum_Ritonavir

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT7 as select

    Script_EnteredDate,

   SUBJECTID,

sum(Other) as sum_Other

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

 

DATA clean.pharm_ARVSUM;

MERGE WANT1 WANT2 WANT3 WANT4 WANT5 WANT6 WANT7;

BY SUBJECTID SCRIPT_ENTEREDDATE;

RUN;

Super User
Posts: 19,772

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

For prescriptions its often handy to break it down to days and then check for drug overlaps with a criteria that they're taken over the same period of at least X days. 

Trusted Advisor
Posts: 1,018

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

Here's a program that

 

  1. retains the lagged value of each drug, the lagged value of regimen, and the lagged valuel of DATE
  2. if the lagged date is exactly one day prior to current date (and the current rec is not the start of the subject), then consolidate the drugs

If you want to stretch the allowable number of days between two consecutive from 1 to 3, then change

if (first.subjectid=0) and (prior_date = script_entereddate-1)" to

"if (first.subjectid=0) and (prior_date >= script_entereddate-3)"

 

dm 'clear log';
data have;
 input Script_EnteredDate  mmddyy10. SUBJECTID sum_INT  sum_nNRTI  sum_PI  sum_Ritonavir  sum_Other  sum_NRTI  regimen $10. ;
 format script_entereddate yymmddd10.;
datalines;
07/10/2007 454 0 0 2 0 0 1 .
04/05/2008 454 0 0 2 0 0 1 .
07/25/2012 455 0 1 0 0 0 1 .
12/12/2006 455 0 1 0 0 0 0 .
01/26/2007 455 0 1 0 0 0 0 .
01/27/2007 455 0 0 0 0 0 1 NRTI-based
05/06/2010 455 0 0 0 0 0 3 NRTI-based
05/12/2010 455 0 1 0 0 0 1 .
05/13/2010 455 0 0 0 0 0 1 NRTI-based
08/04/2010 460 0 1 0 0 0 0 .
08/11/2010 460 0 1 0 0 0 1 .
08/12/2010 460 0 0 0 0 0 1 NRTI-based
01/10/2007 470 0 0 1 0 0 0 .
10/11/2007 470 0 0 1 0 0 0 .
10/16/2007 471 0 0 1 0 0 1 .
10/26/2007 471 0 0 2 0 0 1 .
05/28/2008 471 0 0 2 0 0 1 .
11/28/2006 472 0 0 1 0 0 0 .
11/29/2006 472 0 0 0 0 0 1 NRTI-based
run;

proc sort data=have;
  by subjectid script_entereddate;
run;

data want (drop=m prior:);
  set have;
  by subjectid;

  array current {6} sum_INT sum_nNRTI sum_PI sum_Ritonavir sum_Other sum_NRTI ;
  array prior   {6} ;   /*prior record*/

   
  do m=1 to 6; prior{m}=lag(current{m});  end;
  prior_regimen=lag(regimen);
  prior_date=lag(script_entereddate);   

  if (first.subjectid=0) and (prior_date = script_entereddate-1) then do;
    regimen=coalescec(regimen,prior_regimen);
    do m=1 to 6;  current{m}=max(current{m},prior{m});end;
  end;
run;
Contributor
Posts: 51

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

Thank you for sending this! The code sounds great, but I tried running it for my data and then also just on the sample data, and it doesn't appear to do anything (at least, not that I can detect). Is there any chance I am missing something obvious? Thanks.

Super User
Posts: 19,772

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

The want dataset isn't generated? 

Contributor
Posts: 51

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

Clarification: The "want" dataset is generated, but it appears to be identical to the "have" dataset. 

Super User
Posts: 19,772

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

Run a proc compare between the two and post the results. 

Its the same for both sample and actual data? 

Contributor
Posts: 51

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

After fiddling around with my dataset and restarting SAS, it looks the provided code is now making visible changes (sorry for any earlier confusion!). Thank you so much to everyone who has responded to this post.

 

I just have one follow-up question regarding modifications to the code. Here is an excerpt of data for one subject first from the original dataset and then after applying two different versions of the code to lag the medication column values based on date:

ORIGINAL DATA FOR SUBJECT #118

Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other

4 118 12/12/2006 0 0 1 0 0 0

5 118 01/26/2007 0 0 1 0 0 0

6 118 01/27/2007 0 2 0 0 0 0 

7 118 05/06/2010 0 4 0 0 0 0 

8 118 05/12/2010 0 1 1 0 0 0

9 118 05/13/2010 0 1 0 0 0 0 

10 118 08/04/2010 0 0 1 0 0 0

11 118 08/11/2010 0 1 1 0 0 0

12 118 08/12/2010 0 1 0 0 0 0 

 

AFTER APPLYING CODE TO COMBINE MEDICATIONS FOR DATES 1 DAY APART:

if (first.INMATENO=0) and (prior_date = MEDSTARTDATE-1) then do;

 

Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other

4 118 12/12/2006 0 0 1 0 0 0

5 118 01/26/2007 0 0 1 0 0 0

6 118 01/27/2007 0 2 1 0 0 0

7 118 05/06/2010 0 4 0 0 0 0

8 118 05/12/2010 0 1 1 0 0 0

9 118 05/13/2010 0 1 1 0 0 0

10 118 08/04/2010 0 0 1 0 0 0

11 118 08/11/2010 0 1 1 0 0 0

12 118 08/12/2010 0 1 1 0 0 0

 

AFTER APPLYING CODE TO COMBINE MEDICATIONS FOR DATES 14 DAYS APART:

if (first.INMATENO=0) and (prior_date >= MEDSTARTDATE-14) then do;

 

Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other

4 118 12/12/2006 0 0 1 0 0 0

5 118 01/26/2007 0 0 1 0 0 0

6 118 01/27/2007 0 2 1 0 0 0

7 118 05/06/2010 0 4 0 0 0 0

8 118 05/12/2010 0 4 1 0 0 0

9 118 05/13/2010 0 1 1 0 0 0

10 118 08/04/2010 0 0 1 0 0 0

11 118 08/11/2010 0 1 1 0 0 0

12 118 08/12/2010 0 1 1 0 0 0

 

As you can see by looking at observations 7, 8, and 9 for this subject, the -1 day code is able to properly lag the medications. However, the -14 day code does not lag observations for the whole 14-day lag period (by this I mean, for example, within 14 days of date 05/06/2010 for observation #7). Specifically, I was hoping observation #9 would look like:

7 118 05/06/2010 0 4 0 0 0 0

8 118 05/12/2010 0 4 1 0 0 0

9 118 05/13/2010 0 4 1 0 0 0

 

Is there a way to easily remedy this?

 

Thank you again.

Super User
Posts: 11,343

Re: How do I combine rows based on whether the dates are within a certain time period from each othe

You may want to look at the instructions here to create datastep code to duplicate your dataset for us to use in testing: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

If we have to write our own code to read your data there is a possibility that we treat some of your variables differently and end up with numeric where your variables are text or vice versa. A big issue is assumptions about dates for instance. You show a value but is it a SAS date value or text? That will behave very differently for sort and possibly fail completely for other functions.

Ask a Question
Discussion stats
  • 8 replies
  • 325 views
  • 0 likes
  • 4 in conversation