Solved: Re: Adding overlapping dates to the end of a date range

chuakp · Posted 10-21-2022 01:07 PM

I have a database of prescription claims that contains the dispensing date and end date of each prescription, as defined by days supplied. The goal is to calculate how many whether the patient has continuous coverage without more than a 7-day gap. The issue is that there are times in which a patient may have 2 ore more prescriptions in which the dispensing and end dates overlap. For example, the data might look like this:

PATIENT_ID RX_ID DISPENSE_DATE END_DATE

1 A 1/1/2022 1/8/2022

1 B 1/7/2022 1/9/2022

1 C 1/8/2022 1/11/2022

1 D 1/19/2022 1/25/2022

1 E 1/26/2022 1/30/2022

If you only consider the dispense_date and end_date, it looks like consider there is a 8-day gap between C and D. However, RX_ID B overlaps with A for 2 days (1/7/2022-1/8/2022), and C overlaps with A for 1 day (1/8/2022), and I would like to consider the patient as having coverage from 1/1/2022 through 1/14/2022 (i.e., 1/11/2022 plus the number of overlapping dates), in which case there is only a 5-day gap before D. Basically, I want to turn the above database into something like the following (the observations for D and E would stay the same because there is no overlap).

PATIENT_ID RX_ID DISPENSE_DATE END_DATE_COMBINED

1 A 1/1/2022 1/14/2022

1 D 1/19/2022 1/25/2022

1 E 1/26/2022 1/30/2022

Note: while not in this example data, I'm aware that there could be situations in which prescription D could have started on 1/13/2022, in which case it overlaps with the new range of A after stitching A/B/C together. However, I don't care about that for the purposes of this analysis.

I'd appreciate any advice. Thanks.

Tom · Posted 10-24-2022 09:07 AM

You don't really have to look ahead (as long as the data is sorted by id and dispense_date).

But you might want to remerge the final result to get the whole overlapping period onto each observation in the group.

data want;
  set have ;
  by patient_id dispense_date;
  retain first_date last_date group;
  if first.patient_id then call missing(first_date,last_date,group);
  if dispense_date > last_date then do;
     group+1;
     first_date=dispense_date; last_date=end_date; 
  end;
  else do;
     first_date=min(first_date,dispense_date);
     last_date=max(last_date,end_date);
  end;
  format first_date last_date yymmdd10.;
run;

data want;
  do _n_=1 by 1 until(last.group);
    merge want groups;
    by patient_id group;
  end;
  do _n_=1 to _n_;
    set want(drop=first_date last_date);
    output;
  end;
run;

View solution in original post

Astounding · Posted 10-22-2022 08:59 AM

Just to point you in a good direction, here is what I would suggest.

Treat this as a two-step problem. First assign a new variable GROUPNO based on whether the current observation belongs in the same group as a previous observation or not. Then group them. For example, start along these lines:

data want;
   set have;
   by patient_id;
   if firstpatient_id then groupno=1;
   prior_end = lag(end_date);
   if ... whatever conditions you would like to indicate a new group should start ...
   then groupno + 1;
run;

This step should give you a result aong the lines of:

PATIENT_ID       RX_ID   DISPENSE_DATE     END_DATE    GROUP_NO      

1                  A      1/1/2022         1/8/2022        1      
1                  B      1/7/2022         1/9/2022        1
1                  C      1/8/2022         1/11/2022       1
1                  D      1/19/2022        1/25/2022       2
1                  E      1/26/2022        1/30/2022       3

From there, calculating the maxium END_DATE for each PATIENT_ID / GROUP_NO should be straightforward.

chuakp · Posted 10-22-2022 04:22 PM

I appreciate the reply. To clarify, I am not trying to calculate the maximum end_date for each PATIENT_ID / GROUP_NO. I am trying to determine the number of days of overlap within GROUP_NO's, then add that number to the end of the the maximum end_date for each PATIENT/GROUP_NO. Part of my difficulty is determining the number of dates of overlap.

Tom · Posted 10-22-2022 05:15 PM

Isn't the number of days of overlap just the difference between the sum of each individual record's duration and the duration of the overall period? Which you can calculate if you know the first and last date for the period.

data have;
 input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmdd. END_DATE :yymmdd. GROUP_NO ;
 format DISPENSE_DATE END_DATE yymmdd10.;
cards;
1 A 2022-01-01 2022-01-08 1
1 B 2022-01-07 2022-01-09 1
1 C 2022-01-08 2022-01-11 1
1 D 2022-01-19 2022-01-25 2
1 E 2022-01-26 2022-01-30 3
;

data want;
  set have ;
  by patient_id group_no;
  days_supp = end_date - dispense_date + 1;
  retain first_date last_date;
  if first.group_no then first_date = dispense_date ;
  if first.group_no then total_supp = days_supp;
  else total_supp + days_supp;
  last_date=max(last_date,end_date);
  total_days = end_date - first_date +1;
  overlap = total_supp - total_days;
  format first_date last_date yymmdd10.;
run;

proc print;
run;

Result

chuakp · Posted 10-24-2022 07:51 AM

Thanks Tom. You've pointed me in the right direction for this problem.

One remaining issue is the creation of the groups. In the example data, Rx B overlaps with Rx A and Rx C overlaps with Rx B, so you can just compare the current observation to the previous one to group prescriptions together.

data have;
 input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmdd. END_DATE :yymmdd. GROUP_NO ;
 format DISPENSE_DATE END_DATE yymmdd10.;
cards;
1 A 2022-01-01 2022-01-08 1
1 B 2022-01-07 2022-01-09 1
1 C 2022-01-08 2022-01-11 1
1 D 2022-01-19 2022-01-25 2
1 E 2022-01-26 2022-01-30 3
;

But imagine you instead had data like like this, in which C doesn't overlap with A but C does overlap with B. We still want to group C together with A and B, but you can't do that by comparing the current observation to the previous observation.

data have;

input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmd. END_DATE :yymmdd.;

format DISPENSE_DATE END_DATE yymmdd10.;

cards;

1 A 2022-01-01 2022-01-08

1 A 2022-01-04 2022-01-06

1 C 2022-01-07 2022-01-11

1 D 2022-01-19 2022-01-25

1 E 2022-01-26 2022-01-30

;

run;

Tom · Posted 10-24-2022 09:07 AM

You don't really have to look ahead (as long as the data is sorted by id and dispense_date).

But you might want to remerge the final result to get the whole overlapping period onto each observation in the group.

data want;
  set have ;
  by patient_id dispense_date;
  retain first_date last_date group;
  if first.patient_id then call missing(first_date,last_date,group);
  if dispense_date > last_date then do;
     group+1;
     first_date=dispense_date; last_date=end_date; 
  end;
  else do;
     first_date=min(first_date,dispense_date);
     last_date=max(last_date,end_date);
  end;
  format first_date last_date yymmdd10.;
run;

data want;
  do _n_=1 by 1 until(last.group);
    merge want groups;
    by patient_id group;
  end;
  do _n_=1 to _n_;
    set want(drop=first_date last_date);
    output;
  end;
run;

chuakp · Posted 10-26-2022 03:59 PM

Tom, thanks for your reply. I was able to figure it out using your code for grouping prescriptions together.

Classroom Training Available!