I have a database of subjects, medications that were prescribed, and the dates they were prescribed. I would like to assign medication regimens to each subject by looking at which of these medications were used over the same time period. A subject can have more than one regimen; what determines the regimen are the types of medications and when they were being taken (Script_EnteredDate = date the medication was prescribed).
I was able to combine rows that had the same dates, look at their medication combinations, and assign regimens accordingly (my code for this is very long, so I have put it at the very bottom of this post for reference). The problem is that a lot of regimens are still missing.
Example of the data is below:
Obs Script_EnteredDate SUBJECTID sum_INT sum_nNRTI sum_PI sum_Ritonavir sum_Other sum_NRTI regimen
1 07/10/2007 454 0 0 2 0 0 1 .
2 04/05/2008 454 0 0 2 0 0 1 .
3 07/25/2012 455 0 1 0 0 0 1 .
4 12/12/2006 455 0 1 0 0 0 0 .
5 01/26/2007 455 0 1 0 0 0 0 .
6 01/27/2007 455 0 0 0 0 0 1 NRTI-based
7 05/06/2010 455 0 0 0 0 0 3 NRTI-based
8 05/12/2010 455 0 1 0 0 0 1 .
9 05/13/2010 455 0 0 0 0 0 1 NRTI-based
10 08/04/2010 460 0 1 0 0 0 0 .
11 08/11/2010 460 0 1 0 0 0 1 .
12 08/12/2010 460 0 0 0 0 0 1 NRTI-based
13 01/10/2007 470 0 0 1 0 0 0 .
14 10/11/2007 470 0 0 1 0 0 0 .
15 10/16/2007 471 0 0 1 0 0 1 .
16 10/26/2007 471 0 0 2 0 0 1 .
17 05/28/2008 471 0 0 2 0 0 1 .
18 11/28/2006 472 0 0 1 0 0 0 .
19 11/29/2006 472 0 0 0 0 0 1 NRTI-based
It is likely that there are a few subjects whose medication combination does not fit one of my regimen criteria. HOWEVER, my theory is that in some cases, the medications were prescribed close together but not necessarily on the same day. This would mean that there are a lot of single medications that are not being assigned to a regimen (because my code was written to only count medications as part of a row/regimen if they were prescribed on exactly the same day), even though those medications were taken during the same time period (give or take a few weeks, or sometimes months). I would like to be able to combine (specifically, sum) some of the rows together if their dates are close to each other, for example only two weeks apart. For example, the last two observations (subjectid 472) would below in the same row because their prescription dates are only one day apart:
18 11/28/2006 472 0 0 1 0 0 0 .
19 11/29/2006 472 0 0 0 0 0 1 NRTI-based
I would like these two rows to turn into something like this:
18 11/29/2006 472 0 0 1 0 0 1 NRTI-based*
(*Note: This "NRTI-based" regimen assignment could change to another regimen depending on what medications end up being combined into this row. But I already have a code to do this assignment for each row, assuming the rows are produced correctly.)
Any help would be much appreciated! Thank you in advance.
CODE I USED PREVIOUSLY TO COMBINE ROWS WITH SAME DATES:
proc sql;
create table WANT1 as select
Script_EnteredDate,
SUBJECTID,
sum(INT) as sum_INT
from clean.pharm_arv
group by SUBJECTID, Script_EnteredDate;
quit;
proc sql;
create table WANT2 as select
Script_EnteredDate,
SUBJECTID,
sum(mNRTI) as sum_mNRTI
from clean.pharm_arv
group by SUBJECTID, Script_EnteredDate;
quit;
proc sql;
create table WANT3 as select
Script_EnteredDate,
SUBJECTID,
sum(NRTI) as sum_NRTI
from clean.pharm_arv
group by SUBJECTID, Script_EnteredDate;
quit;
proc sql;
create table WANT4 as select
Script_EnteredDate,
SUBJECTID,
sum(nNRTI) as sum_nNRTI
from clean.pharm_arv
group by SUBJECTID, Script_EnteredDate;
quit;
proc sql;
create table WANT5 as select
Script_EnteredDate,
SUBJECTID,
sum(PI) as sum_PI
from clean.pharm_arv
group by SUBJECTID, Script_EnteredDate;
quit;
proc sql;
create table WANT6 as select
Script_EnteredDate,
SUBJECTID,
sum(Ritonavir) as sum_Ritonavir
from clean.pharm_arv
group by SUBJECTID, Script_EnteredDate;
quit;
proc sql;
create table WANT7 as select
Script_EnteredDate,
SUBJECTID,
sum(Other) as sum_Other
from clean.pharm_arv
group by SUBJECTID, Script_EnteredDate;
quit;
DATA clean.pharm_ARVSUM;
MERGE WANT1 WANT2 WANT3 WANT4 WANT5 WANT6 WANT7;
BY SUBJECTID SCRIPT_ENTEREDDATE;
RUN;
For prescriptions its often handy to break it down to days and then check for drug overlaps with a criteria that they're taken over the same period of at least X days.
Here's a program that
If you want to stretch the allowable number of days between two consecutive from 1 to 3, then change
if (first.subjectid=0) and (prior_date = script_entereddate-1)" to
"if (first.subjectid=0) and (prior_date >= script_entereddate-3)"
dm 'clear log';
data have;
 input Script_EnteredDate  mmddyy10. SUBJECTID sum_INT  sum_nNRTI  sum_PI  sum_Ritonavir  sum_Other  sum_NRTI  regimen $10. ;
 format script_entereddate yymmddd10.;
datalines;
07/10/2007 454 0 0 2 0 0 1 .
04/05/2008 454 0 0 2 0 0 1 .
07/25/2012 455 0 1 0 0 0 1 .
12/12/2006 455 0 1 0 0 0 0 .
01/26/2007 455 0 1 0 0 0 0 .
01/27/2007 455 0 0 0 0 0 1 NRTI-based
05/06/2010 455 0 0 0 0 0 3 NRTI-based
05/12/2010 455 0 1 0 0 0 1 .
05/13/2010 455 0 0 0 0 0 1 NRTI-based
08/04/2010 460 0 1 0 0 0 0 .
08/11/2010 460 0 1 0 0 0 1 .
08/12/2010 460 0 0 0 0 0 1 NRTI-based
01/10/2007 470 0 0 1 0 0 0 .
10/11/2007 470 0 0 1 0 0 0 .
10/16/2007 471 0 0 1 0 0 1 .
10/26/2007 471 0 0 2 0 0 1 .
05/28/2008 471 0 0 2 0 0 1 .
11/28/2006 472 0 0 1 0 0 0 .
11/29/2006 472 0 0 0 0 0 1 NRTI-based
run;
proc sort data=have;
  by subjectid script_entereddate;
run;
data want (drop=m prior:);
  set have;
  by subjectid;
  array current {6} sum_INT sum_nNRTI sum_PI sum_Ritonavir sum_Other sum_NRTI ;
  array prior   {6} ;   /*prior record*/
   
  do m=1 to 6; prior{m}=lag(current{m});  end;
  prior_regimen=lag(regimen);
  prior_date=lag(script_entereddate);   
  if (first.subjectid=0) and (prior_date = script_entereddate-1) then do;
    regimen=coalescec(regimen,prior_regimen);
    do m=1 to 6;  current{m}=max(current{m},prior{m});end;
  end;
run;
Thank you for sending this! The code sounds great, but I tried running it for my data and then also just on the sample data, and it doesn't appear to do anything (at least, not that I can detect). Is there any chance I am missing something obvious? Thanks.
The want dataset isn't generated?
Clarification: The "want" dataset is generated, but it appears to be identical to the "have" dataset.
Run a proc compare between the two and post the results.
Its the same for both sample and actual data?
After fiddling around with my dataset and restarting SAS, it looks the provided code is now making visible changes (sorry for any earlier confusion!). Thank you so much to everyone who has responded to this post.
I just have one follow-up question regarding modifications to the code. Here is an excerpt of data for one subject first from the original dataset and then after applying two different versions of the code to lag the medication column values based on date:
ORIGINAL DATA FOR SUBJECT #118
Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other
4 118 12/12/2006 0 0 1 0 0 0
5 118 01/26/2007 0 0 1 0 0 0
6 118 01/27/2007 0 2 0 0 0 0
7 118 05/06/2010 0 4 0 0 0 0
8 118 05/12/2010 0 1 1 0 0 0
9 118 05/13/2010 0 1 0 0 0 0
10 118 08/04/2010 0 0 1 0 0 0
11 118 08/11/2010 0 1 1 0 0 0
12 118 08/12/2010 0 1 0 0 0 0
AFTER APPLYING CODE TO COMBINE MEDICATIONS FOR DATES 1 DAY APART:
if (first.INMATENO=0) and (prior_date = MEDSTARTDATE-1) then do;
Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other
4 118 12/12/2006 0 0 1 0 0 0
5 118 01/26/2007 0 0 1 0 0 0
6 118 01/27/2007 0 2 1 0 0 0
7 118 05/06/2010 0 4 0 0 0 0
8 118 05/12/2010 0 1 1 0 0 0
9 118 05/13/2010 0 1 1 0 0 0
10 118 08/04/2010 0 0 1 0 0 0
11 118 08/11/2010 0 1 1 0 0 0
12 118 08/12/2010 0 1 1 0 0 0
AFTER APPLYING CODE TO COMBINE MEDICATIONS FOR DATES 14 DAYS APART:
if (first.INMATENO=0) and (prior_date >= MEDSTARTDATE-14) then do;
Obs SubjectID Script_EnteredDate sum_INT sum_NRTI sum_nNRTI sum_PI sum_Ritonavir sum_Other
4 118 12/12/2006 0 0 1 0 0 0
5 118 01/26/2007 0 0 1 0 0 0
6 118 01/27/2007 0 2 1 0 0 0
7 118 05/06/2010 0 4 0 0 0 0
8 118 05/12/2010 0 4 1 0 0 0
9 118 05/13/2010 0 1 1 0 0 0
10 118 08/04/2010 0 0 1 0 0 0
11 118 08/11/2010 0 1 1 0 0 0
12 118 08/12/2010 0 1 1 0 0 0
As you can see by looking at observations 7, 8, and 9 for this subject, the -1 day code is able to properly lag the medications. However, the -14 day code does not lag observations for the whole 14-day lag period (by this I mean, for example, within 14 days of date 05/06/2010 for observation #7). Specifically, I was hoping observation #9 would look like:
7 118 05/06/2010 0 4 0 0 0 0
8 118 05/12/2010 0 4 1 0 0 0
9 118 05/13/2010 0 4 1 0 0 0
Is there a way to easily remedy this?
Thank you again.
You may want to look at the instructions here to create datastep code to duplicate your dataset for us to use in testing: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
If we have to write our own code to read your data there is a possibility that we treat some of your variables differently and end up with numeric where your variables are text or vice versa. A big issue is assumptions about dates for instance. You show a value but is it a SAS date value or text? That will behave very differently for sort and possibly fail completely for other functions.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.
