BookmarkSubscribeRSS Feed
CynthiaWei
Obsidian | Level 7

Hi SAS Pros,

 

I have a project that needs to define whether there are any overlap of taking drug A and B happened and calculate the overlap period once it happened.

 

Below is what I have:

ID Drug starting_taking_date Days_of_supply
1 A 10/7/2017 5
1 A 10/20/2017 5
1 A 11/1/2017 5
1 B 10/27/2017 10
1 B 11/15/2017 10
2 A 10/2/2017 7
2 A 10/6/2017 60
2 B 10/14/2017 8
2 B 11/3/2017 8

 

What I need is to create another 4 new variables as follows:

1. last_taking_date: person 1 had a prescription of drug A and started taking A from 10/7/2017, and this prescription had 5 days supply, so the last taking date for this prescription is 10/11/2017

2. interval: (I have sorted by ID, then drug, and then start_taking_date) Person 1 had a second prescription of drug A on 10/20/2018, which was 9 days after the first drug A prescription. The third drug A prescription happened 8 days after the last taking date for the second drug A prescription. Person 1 had drug B 9 days earlier than the last day of the third drug A prescription.

3. Whether_take_AandB_together: person 1 took both drug A and B together from 11/1/2017-11/5/2017 

4. How_long: so the overlap period for person 1 is 5 days

ID Drug start_taking_date Days_of_supply last_taking_date interval Whether_take_AandB_together how_long
1 A 10/7/2017 5 10/11/2017   Y 5
1 A 10/20/2017 5 10/24/2017 9
1 A 11/1/2017 5 11/5/2017 8
1 B 10/27/2017 10 11/5/2017 -9
1 B 11/15/2017 10 11/24/2017 10  
2 A 10/2/2017 7 10/8/2017   Y 16
2 A 10/6/2017 60 12/7/2017 -2
2 B 10/14/2017 8 10/21/2017 -54
2 B 11/3/2017 8 11/10/2017 13

Please pay attention to person 2. Person 2's start taking time for the second prescription of drug A was earlier than it was supposed to be (10/6/2017 is earlier than 10/8/2017), so I still want to calculate as 'NO-early start taking', so the last taking date for the second drug A prescription is 12/7/2017 instead of 12/5/2017. Because the second prescription of drug A covered the drug B's two prescription periods, the overlap period of taking drug A and B 10/14/2017-10/21/2017 and 11/3/2017-11/10/2017, which were 16 days.

 

I think this is a really complicated project and am having no idea to start framing these time periods and overlap calculation, so any help would be really appreciated!!!

 

Best regards,

C

13 REPLIES 13
SteveNZ
Obsidian | Level 7

Hiya, this will get you close but my interval differs from yours. I also put in code to only count the overlap duration once but it's a bit messy:

 

data have;
	set have;
	end_taking_date = sum(starting_taking_date,Days_of_supply,-1);
run;

proc sort data = have;
	by id drug starting_taking_date end_taking_date;
run;

data have;
	set have;
	lag_id = lag(id);
	lag_start = lag(starting_taking_date);
run;

data have (drop = lag_id lag_start);
	set have;
	nomid = _n_;
	if id = lag_id then
		interval = sum(starting_taking_date,-lag_start);
run;

*get overlap;
proc sql;
	create table want_tmp as
		select  m.nomid, max(p.starting_taking_date,m.starting_taking_date) as share_start_tmp,
			min(p.end_taking_date,m.end_taking_date) as share_end_tmp, p.id as sharer_id
		from have as M, have as P
			where m.drug ne p.drug 
				and m.id = p.id 
				and (P.starting_taking_date <= M.end_taking_date and P.end_taking_date >= M.starting_taking_date);
quit;

proc sql;
	create table want (drop = nomid) as
		select  l.*,coalesce(r.share_start_tmp,l.starting_taking_date) as share_start_tmp   ,
			coalesce(r.share_end_tmp,l.end_taking_date) as share_end_tmp, r.sharer_id
		from have as l left join want_tmp as r
			on l.nomid=r.nomid;
quit;

data want;
	set want;
	format share_start_tmp share_end_tmp ddmmyy10.;
	if sharer_id ne '' then
		overlap_tmp = sum(share_end_tmp,-share_start_tmp,1);
run;

proc sort data =  want;
	by id share_start_tmp share_end_tmp;
run;

data want (drop =  overlap_tmp sharer_id share_start_tmp share_end_tmp);
	set want;
	format share_start share_end end_taking_date mmddyy10.;
	by id share_start_tmp share_end_tmp;
	if sharer_id ne '' and first.share_start_tmp then
		overlap = overlap_tmp;
	if sharer_id ne '' then do;
		share_start = share_start_tmp;
		share_end = share_end_tmp;
	end;
run;

proc sort data =  want;
	by id drug share_start share_end;
run;

 

cheers

Steve

CynthiaWei
Obsidian | Level 7

Hi Steve,

 

I really appreciate your prompt reply a lot! This is absolutely a comprehensive code to solve my problem. Hats off to the genius!

 

Thank you so much!

 

Best regards,

 

C

 

 

CynthiaWei
Obsidian | Level 7

Hi Steve,

 

I am looking at your code. Could you please explain a little bit for the two proc sql statements? I am quite sure I understand m.varname and p.varname mean in the syntax.

 

Thank you so much!

 

Best regards,

C

CynthiaWei
Obsidian | Level 7

Oh, Hi, Steve, actually my calculation for the interval should be 

interval=starting_taking_date - lag_last (not lag_start), 🙂 and, lag_last=lag(starting_taking_date)

Does this affect any subsequent syntax?

 

I really appreciate your advice!

 

Best,

Reeza
Super User
Prescription drug monitoring? Usually you allow 3 days of overlap for prescription refills and don't count those as an overlap....
ballardw
Super User

@Reeza wrote:
Prescription drug monitoring? Usually you allow 3 days of overlap for prescription refills and don't count those as an overlap....

Some of my prescriptions vary up to 45 day overlaps (on 90 day scripts). Sometimes there may be a discount and they get filled early to save the insurance company $$, others something esoteric between the pharmacy reschedule requests and my prescribing physician response and may be changing rules about 30 day vs 90 day and ?!? insurance rules.

CynthiaWei
Obsidian | Level 7

Hi,

Thanks for discussing my case. I can make the decision for the cut-off point based on the range of overlap I am going to get.

Best,

C

CynthiaWei
Obsidian | Level 7

Hi, thank you for your kind remind. It is true that there may a short window interval. It is tricky to decide the cut-off point, i.e. should have a few days in between or not. So, what I am going to do is just calculate the overlap (>=1) and the range. I need to make a clinical-meaning decision later for the cut-off point. 

 

Best,

C

mkeintz
PROC Star

I think this is one of those tasks where it makes the most sense to maintain an array indexed by the date range of your study.  Let's say all your prescription supplies are meant to be completely used between 01jan2016 through 31dec2018.  Then for drug A populate the elements of the array corresponding to the prescription dates with 1's.  After A is exhausted then scan the dates for B in the same array, counting the number of 1's (from A) that are found.  That would be total overlap count:

 

data have;
 input ID 	Drug :$1. 	start_taking_date :mmddyy8.	Days_of_supply;
 format start_taking_date mmddyy8.;
datalines;
1 	A 	10/7/2017 	5
1 	A 	10/20/2017 	5
1 	A 	11/1/2017 	5
1 	B 	10/27/2017 	10
1 	B 	11/15/2017 	10
2 	A 	10/2/2017 	7
2 	A 	10/6/2017 	60
2 	B 	10/14/2017 	8
2 	B 	11/3/2017 	8
run;

%let history_beg=01jan2016;
%let history_end=31dec2018;

data want (drop=_:);

  /* Read all drug for each ID, just to calculate OVERLAP days */
  do until (last.id);
    set have;
	by id;
	if first.id then overlap=0;

	/* Here's an array indexed from 01jan2016 to 31dec2018 */
	array _calendar {%sysevalf("&history_beg"d):%sysevalf("&history_end"d)}  ;

	/* Loop over the dates for each prescription.  If it's drug A, put a 1 in
	/* the corresponding element in the array.  But if it's a B, count the number
	/* of times the B prescription encounters a 1 in the array */
	do _d=start_taking_date to start_taking_date+days_of_supply-1;
	  if drug='A' then _calendar{_d}=1;
	  else if drug='B' and _calendar{_d}=1 then overlap=overlap+1;;
	end;
  end;

  /* Now that total overlap has been calculated, reread the same id, calculate 
  /* the other vars and output the record */

  do until (last.id);
    set have;
    by id ;
    last_taking_date=start_taking_date+days_of_supply-1;
    format last_taking_date mmddyy8.;
    interval=start_taking_date-lag(last_taking_date);
    if first.id then interval=.;
    output;
  end;
run;

This depends on the data being sorted by id/drug/starting_date.  Note in the program each ID is read through twice - the first time to calculate overlap, a constant for the id.  The second time to calculate the other variables, which are generate within each record, or between each record and its predecessor.

 

BTW, you have interval=-54 for the next-to-last record, but I get -51.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
CynthiaWei
Obsidian | Level 7

Hi,

 

I really appreciate your code! It worked for the test codes. But my real data is much larger than the test dataset. I run the code and got and error report from log said that 

 

ERROR: Array subscript out of range at line 216 column 34.

 

As a result, there are only parts observations were calculated.

 

Could you please help me figure out what is going on here?

 

Thank you very much!

 

Best regards,

C

mkeintz
PROC Star

This is NOT a data size issue.  It is a date-range issue.  That's why the message says "array subscript out of range".

 

The out-of-range means that you have dates before 01jan2016  or after 31dec2018, which the defined array can't accommodate.  That's why I said " Let's say all your prescription supplies are meant to be completely used between 01jan2016 through 31dec2018."

 

So just change the two %LET statements to accomodate then entire possible history of your data.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
CynthiaWei
Obsidian | Level 7

Hi,

Thanks for the explanation. I only run the range for start_taking_date. It turns out that my data range is out of range. I have change the %LET statements.

 

But, when I run the new code, the SAS log said like this:

 

Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY
expression is missing, zero, or invalid.

 

I am familiar with loop statement, so I have no idea what does this log message mean. Is is because my dataset has some variables with missing values? I don't think there are any variables with any missing. 

 

I really appreciate your advice a lot!

 

Best regards,

C

mkeintz
PROC Star

Please show the log.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 3897 views
  • 0 likes
  • 5 in conversation