Re: How to identify overlap and calculate overlap time period

CynthiaWei · Posted 09-01-2020 07:05 PM

Hi SAS Pros,

I have a project that needs to define whether there are any overlap of taking drug A and B happened and calculate the overlap period once it happened.

Below is what I have:

ID	Drug	starting_taking_date	Days_of_supply
1	A	10/7/2017	5
1	A	10/20/2017	5
1	A	11/1/2017	5
1	B	10/27/2017	10
1	B	11/15/2017	10
2	A	10/2/2017	7
2	A	10/6/2017	60
2	B	10/14/2017	8
2	B	11/3/2017	8

What I need is to create another 4 new variables as follows:

1. last_taking_date: person 1 had a prescription of drug A and started taking A from 10/7/2017, and this prescription had 5 days supply, so the last taking date for this prescription is 10/11/2017

2. interval: (I have sorted by ID, then drug, and then start_taking_date) Person 1 had a second prescription of drug A on 10/20/2018, which was 9 days after the first drug A prescription. The third drug A prescription happened 8 days after the last taking date for the second drug A prescription. Person 1 had drug B 9 days earlier than the last day of the third drug A prescription.

3. Whether_take_AandB_together: person 1 took both drug A and B together from 11/1/2017-11/5/2017

4. How_long: so the overlap period for person 1 is 5 days

ID	Drug	start_taking_date	Days_of_supply	last_taking_date	interval	Whether_take_AandB_together	how_long
1	A	10/7/2017	5	10/11/2017		Y	5
1	A	10/20/2017	5	10/24/2017	9
1	A	11/1/2017	5	11/5/2017	8
1	B	10/27/2017	10	11/5/2017	-9
1	B	11/15/2017	10	11/24/2017	10
2	A	10/2/2017	7	10/8/2017		Y	16
2	A	10/6/2017	60	12/7/2017	-2
2	B	10/14/2017	8	10/21/2017	-54
2	B	11/3/2017	8	11/10/2017	13

Please pay attention to person 2. Person 2's start taking time for the second prescription of drug A was earlier than it was supposed to be (10/6/2017 is earlier than 10/8/2017), so I still want to calculate as 'NO-early start taking', so the last taking date for the second drug A prescription is 12/7/2017 instead of 12/5/2017. Because the second prescription of drug A covered the drug B's two prescription periods, the overlap period of taking drug A and B 10/14/2017-10/21/2017 and 11/3/2017-11/10/2017, which were 16 days.

I think this is a really complicated project and am having no idea to start framing these time periods and overlap calculation, so any help would be really appreciated!!!

Best regards,

C

SteveNZ · Posted 09-01-2020 08:08 PM

Hiya, this will get you close but my interval differs from yours. I also put in code to only count the overlap duration once but it's a bit messy:

data have;
	set have;
	end_taking_date = sum(starting_taking_date,Days_of_supply,-1);
run;

proc sort data = have;
	by id drug starting_taking_date end_taking_date;
run;

data have;
	set have;
	lag_id = lag(id);
	lag_start = lag(starting_taking_date);
run;

data have (drop = lag_id lag_start);
	set have;
	nomid = _n_;
	if id = lag_id then
		interval = sum(starting_taking_date,-lag_start);
run;

*get overlap;
proc sql;
	create table want_tmp as
		select  m.nomid, max(p.starting_taking_date,m.starting_taking_date) as share_start_tmp,
			min(p.end_taking_date,m.end_taking_date) as share_end_tmp, p.id as sharer_id
		from have as M, have as P
			where m.drug ne p.drug 
				and m.id = p.id 
				and (P.starting_taking_date <= M.end_taking_date and P.end_taking_date >= M.starting_taking_date);
quit;

proc sql;
	create table want (drop = nomid) as
		select  l.*,coalesce(r.share_start_tmp,l.starting_taking_date) as share_start_tmp   ,
			coalesce(r.share_end_tmp,l.end_taking_date) as share_end_tmp, r.sharer_id
		from have as l left join want_tmp as r
			on l.nomid=r.nomid;
quit;

data want;
	set want;
	format share_start_tmp share_end_tmp ddmmyy10.;
	if sharer_id ne '' then
		overlap_tmp = sum(share_end_tmp,-share_start_tmp,1);
run;

proc sort data =  want;
	by id share_start_tmp share_end_tmp;
run;

data want (drop =  overlap_tmp sharer_id share_start_tmp share_end_tmp);
	set want;
	format share_start share_end end_taking_date mmddyy10.;
	by id share_start_tmp share_end_tmp;
	if sharer_id ne '' and first.share_start_tmp then
		overlap = overlap_tmp;
	if sharer_id ne '' then do;
		share_start = share_start_tmp;
		share_end = share_end_tmp;
	end;
run;

proc sort data =  want;
	by id drug share_start share_end;
run;

cheers

Steve

CynthiaWei · Posted 09-04-2020 04:39 AM

Hi Steve,

I really appreciate your prompt reply a lot! This is absolutely a comprehensive code to solve my problem. Hats off to the genius!

Thank you so much!

Best regards,

C

CynthiaWei · Posted 09-05-2020 02:46 PM

Hi Steve,

I am looking at your code. Could you please explain a little bit for the two proc sql statements? I am quite sure I understand m.varname and p.varname mean in the syntax.

Thank you so much!

Best regards,

C

CynthiaWei · Posted 09-05-2020 02:54 PM

Oh, Hi, Steve, actually my calculation for the interval should be

interval=starting_taking_date - lag_last (not lag_start), 🙂 and, lag_last=lag(starting_taking_date)

Does this affect any subsequent syntax?

I really appreciate your advice!

Best,

Reeza · Posted 09-01-2020 11:14 PM

Prescription drug monitoring? Usually you allow 3 days of overlap for prescription refills and don't count those as an overlap....

ballardw · Posted 09-02-2020 05:04 PM

@Reeza wrote:
Prescription drug monitoring? Usually you allow 3 days of overlap for prescription refills and don't count those as an overlap....

Some of my prescriptions vary up to 45 day overlaps (on 90 day scripts). Sometimes there may be a discount and they get filled early to save the insurance company $$, others something esoteric between the pharmacy reschedule requests and my prescribing physician response and may be changing rules about 30 day vs 90 day and ?!? insurance rules.

CynthiaWei · Posted 09-04-2020 04:52 AM

Hi,

Thanks for discussing my case. I can make the decision for the cut-off point based on the range of overlap I am going to get.

Best,

C

CynthiaWei · Posted 09-04-2020 04:50 AM

Hi, thank you for your kind remind. It is true that there may a short window interval. It is tricky to decide the cut-off point, i.e. should have a few days in between or not. So, what I am going to do is just calculate the overlap (>=1) and the range. I need to make a clinical-meaning decision later for the cut-off point.

Best,

C

mkeintz · Posted 09-03-2020 02:10 AM

I think this is one of those tasks where it makes the most sense to maintain an array indexed by the date range of your study. Let's say all your prescription supplies are meant to be completely used between 01jan2016 through 31dec2018. Then for drug A populate the elements of the array corresponding to the prescription dates with 1's. After A is exhausted then scan the dates for B in the same array, counting the number of 1's (from A) that are found. That would be total overlap count:

data have;
 input ID 	Drug :$1. 	start_taking_date :mmddyy8.	Days_of_supply;
 format start_taking_date mmddyy8.;
datalines;
1 	A 	10/7/2017 	5
1 	A 	10/20/2017 	5
1 	A 	11/1/2017 	5
1 	B 	10/27/2017 	10
1 	B 	11/15/2017 	10
2 	A 	10/2/2017 	7
2 	A 	10/6/2017 	60
2 	B 	10/14/2017 	8
2 	B 	11/3/2017 	8
run;

%let history_beg=01jan2016;
%let history_end=31dec2018;

data want (drop=_:);

  /* Read all drug for each ID, just to calculate OVERLAP days */
  do until (last.id);
    set have;
	by id;
	if first.id then overlap=0;

	/* Here's an array indexed from 01jan2016 to 31dec2018 */
	array _calendar {%sysevalf("&history_beg"d):%sysevalf("&history_end"d)}  ;

	/* Loop over the dates for each prescription.  If it's drug A, put a 1 in
	/* the corresponding element in the array.  But if it's a B, count the number
	/* of times the B prescription encounters a 1 in the array */
	do _d=start_taking_date to start_taking_date+days_of_supply-1;
	  if drug='A' then _calendar{_d}=1;
	  else if drug='B' and _calendar{_d}=1 then overlap=overlap+1;;
	end;
  end;

  /* Now that total overlap has been calculated, reread the same id, calculate 
  /* the other vars and output the record */

  do until (last.id);
    set have;
    by id ;
    last_taking_date=start_taking_date+days_of_supply-1;
    format last_taking_date mmddyy8.;
    interval=start_taking_date-lag(last_taking_date);
    if first.id then interval=.;
    output;
  end;
run;

This depends on the data being sorted by id/drug/starting_date. Note in the program each ID is read through twice - the first time to calculate overlap, a constant for the id. The second time to calculate the other variables, which are generate within each record, or between each record and its predecessor.

BTW, you have interval=-54 for the next-to-last record, but I get -51.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

CynthiaWei · Posted 09-04-2020 04:46 AM

Hi,

I really appreciate your code! It worked for the test codes. But my real data is much larger than the test dataset. I run the code and got and error report from log said that

ERROR: Array subscript out of range at line 216 column 34.

As a result, there are only parts observations were calculated.

Could you please help me figure out what is going on here?

Thank you very much!

Best regards,

C

mkeintz · Posted 09-04-2020 12:01 PM

This is NOT a data size issue. It is a date-range issue. That's why the message says "array subscript out of range".

The out-of-range means that you have dates before 01jan2016 or after 31dec2018, which the defined array can't accommodate. That's why I said " Let's say all your prescription supplies are meant to be completely used between 01jan2016 through 31dec2018."

So just change the two %LET statements to accomodate then entire possible history of your data.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

CynthiaWei · Posted 09-04-2020 07:43 PM

Hi,

Thanks for the explanation. I only run the range for start_taking_date. It turns out that my data range is out of range. I have change the %LET statements.

But, when I run the new code, the SAS log said like this:

Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY
expression is missing, zero, or invalid.

I am familiar with loop statement, so I have no idea what does this log message mean. Is is because my dataset has some variables with missing values? I don't think there are any variables with any missing.

I really appreciate your advice a lot!

Best regards,

C

mkeintz · Posted 09-04-2020 10:57 PM

Please show the log.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away