BookmarkSubscribeRSS Feed
Kels123
Quartz | Level 8

I have a database of subjects, medications that were prescribed, and the dates they were prescribed. I would like to assign medication regimens to each subject by looking at which of these medications were used over the same time period. A subject can have more than one regimen; what determines the regimen are the types of medications and when they were being taken (Script_EnteredDate = date the medication was prescribed).

 

I was able to combine rows that had the same dates, look at their medication combinations, and assign regimens accordingly (my code for this is very long, so I have put it at the very bottom of this post for reference). The problem is that a lot of regimens are still missing.

 

Example of the data is below:

 

Obs Script_EnteredDate SUBJECTID sum_INT  sum_nNRTI  sum_PI  sum_Ritonavir  sum_Other  sum_NRTI  regimen

1 07/10/2007 454 0 0 2 0 0 1 .

2 04/05/2008 454 0 0 2 0 0 1 .

3 07/25/2012 455 0 1 0 0 0 1 .

4 12/12/2006 455 0 1 0 0 0 0 .

5 01/26/2007 455 0 1 0 0 0 0 .

6 01/27/2007 455 0 0 0 0 0 1 NRTI-based

7 05/06/2010 455 0 0 0 0 0 3 NRTI-based

8 05/12/2010 455 0 1 0 0 0 1 .

9 05/13/2010 455 0 0 0 0 0 1 NRTI-based

10 08/04/2010 460 0 1 0 0 0 0 .

11 08/11/2010 460 0 1 0 0 0 1 .

12 08/12/2010 460 0 0 0 0 0 1 NRTI-based

13 01/10/2007 470 0 0 1 0 0 0 .

14 10/11/2007 470 0 0 1 0 0 0 .

15 10/16/2007 471 0 0 1 0 0 1 .

16 10/26/2007 471 0 0 2 0 0 1 .

17 05/28/2008 471 0 0 2 0 0 1 .

18 11/28/2006 472 0 0 1 0 0 0 .

19 11/29/2006 472 0 0 0 0 0 1 NRTI-based

 

It is likely that there are a few subjects whose medication combination does not fit one of my regimen criteria. HOWEVER, my theory is that in some cases, the medications were prescribed close together but not necessarily on the same day. This would mean that there are a lot of single medications that are not being assigned to a regimen (because my code was written to only count medications as part of a row/regimen if they were prescribed on exactly the same day), even though those medications were taken during the same time period (give or take a few weeks, or sometimes months). I would like to be able to combine (specifically, sum) some of the rows together if their dates are close to each other, for example only two weeks apart. For example, the last two observations (subjectid 472) would below in the same row because their prescription dates are only one day apart:

18 11/28/2006 472 0 0 1 0 0 0 .

19 11/29/2006 472 0 0 0 0 0 1 NRTI-based

 

I would like these two rows to turn into something like this:

18 11/29/2006 472 0 0 1 0 0 1 NRTI-based*  

(*Note: This "NRTI-based" regimen assignment could change to another regimen depending on what medications end up being combined into this row. But I already have a code to do this assignment for each row, assuming the rows are produced correctly.)

 

Any help would be much appreciated! Thank you in advance.

 

CODE I USED PREVIOUSLY TO COMBINE ROWS WITH SAME DATES:

 

proc sql;

    create table WANT1 as select

    Script_EnteredDate,

    SUBJECTID,

    sum(INT) as sum_INT

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

 

proc sql;

    create table WANT2 as select

    Script_EnteredDate,

   SUBJECTID,

sum(mNRTI) as sum_mNRTI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT3 as select

    Script_EnteredDate,

   SUBJECTID,

sum(NRTI) as sum_NRTI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT4 as select

    Script_EnteredDate,

   SUBJECTID,

sum(nNRTI) as sum_nNRTI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT5 as select

    Script_EnteredDate,

   SUBJECTID,

sum(PI) as sum_PI

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT6 as select

    Script_EnteredDate,

   SUBJECTID,

sum(Ritonavir) as sum_Ritonavir

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

proc sql;

    create table WANT7 as select

    Script_EnteredDate,

   SUBJECTID,

sum(Other) as sum_Other

    from clean.pharm_arv

    group by SUBJECTID, Script_EnteredDate;

quit;

 

DATA clean.pharm_ARVSUM;

MERGE WANT1 WANT2 WANT3 WANT4 WANT5 WANT6 WANT7;

BY SUBJECTID SCRIPT_ENTEREDDATE;

RUN;

8 REPLIES 8
Reeza
Super User

For prescriptions its often handy to break it down to days and then check for drug overlaps with a criteria that they're taken over the same period of at least X days. 

mkeintz
PROC Star

Here's a program that

 

  1. retains the lagged value of each drug, the lagged value of regimen, and the lagged valuel of DATE
  2. if the lagged date is exactly one day prior to current date (and the current rec is not the start of the subject), then consolidate the drugs

If you want to stretch the allowable number of days between two consecutive from 1 to 3, then change

if (first.subjectid=0) and (prior_date = script_entereddate-1)" to

"if (first.subjectid=0) and (prior_date >= script_entereddate-3)"

 

dm 'clear log';
data have;
 input Script_EnteredDate  mmddyy10. SUBJECTID sum_INT  sum_nNRTI  sum_PI  sum_Ritonavir  sum_Other  sum_NRTI  regimen $10. ;
 format script_entereddate yymmddd10.;
datalines;
07/10/2007 454 0 0 2 0 0 1 .
04/05/2008 454 0 0 2 0 0 1 .
07/25/2012 455 0 1 0 0 0 1 .
12/12/2006 455 0 1 0 0 0 0 .
01/26/2007 455 0 1 0 0 0 0 .
01/27/2007 455 0 0 0 0 0 1 NRTI-based
05/06/2010 455 0 0 0 0 0 3 NRTI-based
05/12/2010 455 0 1 0 0 0 1 .
05/13/2010 455 0 0 0 0 0 1 NRTI-based
08/04/2010 460 0 1 0 0 0 0 .
08/11/2010 460 0 1 0 0 0 1 .
08/12/2010 460 0 0 0 0 0 1 NRTI-based
01/10/2007 470 0 0 1 0 0 0 .
10/11/2007 470 0 0 1 0 0 0 .
10/16/2007 471 0 0 1 0 0 1 .
10/26/2007 471 0 0 2 0 0 1 .
05/28/2008 471 0 0 2 0 0 1 .
11/28/2006 472 0 0 1 0 0 0 .
11/29/2006 472 0 0 0 0 0 1 NRTI-based
run;

proc sort data=have;
  by subjectid script_entereddate;
run;

data want (drop=m prior:);
  set have;
  by subjectid;

  array current {6} sum_INT sum_nNRTI sum_PI sum_Ritonavir sum_Other sum_NRTI ;
  array prior   {6} ;   /*prior record*/

   
  do m=1 to 6; prior{m}=lag(current{m});  end;
  prior_regimen=lag(regimen);
  prior_date=lag(script_entereddate);   

  if (first.subjectid=0) and (prior_date = script_entereddate-1) then do;
    regimen=coalescec(regimen,prior_regimen);
    do m=1 to 6;  current{m}=max(current{m},prior{m});end;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kels123
Quartz | Level 8

Thank you for sending this! The code sounds great, but I tried running it for my data and then also just on the sample data, and it doesn't appear to do anything (at least, not that I can detect). Is there any chance I am missing something obvious? Thanks.

Kels123
Quartz | Level 8

Clarification: The "want" dataset is generated, but it appears to be identical to the "have" dataset. 

Reeza
Super User

Run a proc compare between the two and post the results. 

Its the same for both sample and actual data? 

Kels123
Quartz | Level 8

After fiddling around with my dataset and restarting SAS, it looks the provided code is now making visible changes (sorry for any earlier confusion!). Thank you so much to everyone who has responded to this post.

 

I just have one follow-up question regarding modifications to the code. Here is an excerpt of data for one subject first from the original dataset and then after applying two different versions of the code to lag the medication column values based on date:

ORIGINAL DATA FOR SUBJECT #118

Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other

4 118 12/12/2006 0 0 1 0 0 0

5 118 01/26/2007 0 0 1 0 0 0

6 118 01/27/2007 0 2 0 0 0 0 

7 118 05/06/2010 0 4 0 0 0 0 

8 118 05/12/2010 0 1 1 0 0 0

9 118 05/13/2010 0 1 0 0 0 0 

10 118 08/04/2010 0 0 1 0 0 0

11 118 08/11/2010 0 1 1 0 0 0

12 118 08/12/2010 0 1 0 0 0 0 

 

AFTER APPLYING CODE TO COMBINE MEDICATIONS FOR DATES 1 DAY APART:

if (first.INMATENO=0) and (prior_date = MEDSTARTDATE-1) then do;

 

Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other

4 118 12/12/2006 0 0 1 0 0 0

5 118 01/26/2007 0 0 1 0 0 0

6 118 01/27/2007 0 2 1 0 0 0

7 118 05/06/2010 0 4 0 0 0 0

8 118 05/12/2010 0 1 1 0 0 0

9 118 05/13/2010 0 1 1 0 0 0

10 118 08/04/2010 0 0 1 0 0 0

11 118 08/11/2010 0 1 1 0 0 0

12 118 08/12/2010 0 1 1 0 0 0

 

AFTER APPLYING CODE TO COMBINE MEDICATIONS FOR DATES 14 DAYS APART:

if (first.INMATENO=0) and (prior_date >= MEDSTARTDATE-14) then do;

 

Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other

4 118 12/12/2006 0 0 1 0 0 0

5 118 01/26/2007 0 0 1 0 0 0

6 118 01/27/2007 0 2 1 0 0 0

7 118 05/06/2010 0 4 0 0 0 0

8 118 05/12/2010 0 4 1 0 0 0

9 118 05/13/2010 0 1 1 0 0 0

10 118 08/04/2010 0 0 1 0 0 0

11 118 08/11/2010 0 1 1 0 0 0

12 118 08/12/2010 0 1 1 0 0 0

 

As you can see by looking at observations 7, 8, and 9 for this subject, the -1 day code is able to properly lag the medications. However, the -14 day code does not lag observations for the whole 14-day lag period (by this I mean, for example, within 14 days of date 05/06/2010 for observation #7). Specifically, I was hoping observation #9 would look like:

7 118 05/06/2010 0 4 0 0 0 0

8 118 05/12/2010 0 4 1 0 0 0

9 118 05/13/2010 0 4 1 0 0 0

 

Is there a way to easily remedy this?

 

Thank you again.

ballardw
Super User

You may want to look at the instructions here to create datastep code to duplicate your dataset for us to use in testing: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

If we have to write our own code to read your data there is a possibility that we treat some of your variables differently and end up with numeric where your variables are text or vice versa. A big issue is assumptions about dates for instance. You show a value but is it a SAS date value or text? That will behave very differently for sort and possibly fail completely for other functions.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2451 views
  • 0 likes
  • 4 in conversation