I have a database of prescription claims that contains the dispensing date and end date of each prescription, as defined by days supplied. The goal is to calculate how many whether the patient has continuous coverage without more than a 7-day gap. The issue is that there are times in which a patient may have 2 ore more prescriptions in which the dispensing and end dates overlap. For example, the data might look like this:
PATIENT_ID RX_ID DISPENSE_DATE END_DATE
1 A 1/1/2022 1/8/2022
1 B 1/7/2022 1/9/2022
1 C 1/8/2022 1/11/2022
1 D 1/19/2022 1/25/2022
1 E 1/26/2022 1/30/2022
If you only consider the dispense_date and end_date, it looks like consider there is a 8-day gap between C and D. However, RX_ID B overlaps with A for 2 days (1/7/2022-1/8/2022), and C overlaps with A for 1 day (1/8/2022), and I would like to consider the patient as having coverage from 1/1/2022 through 1/14/2022 (i.e., 1/11/2022 plus the number of overlapping dates), in which case there is only a 5-day gap before D. Basically, I want to turn the above database into something like the following (the observations for D and E would stay the same because there is no overlap).
PATIENT_ID RX_ID DISPENSE_DATE END_DATE_COMBINED
1 A 1/1/2022 1/14/2022
1 D 1/19/2022 1/25/2022
1 E 1/26/2022 1/30/2022
Note: while not in this example data, I'm aware that there could be situations in which prescription D could have started on 1/13/2022, in which case it overlaps with the new range of A after stitching A/B/C together. However, I don't care about that for the purposes of this analysis.
I'd appreciate any advice. Thanks.
You don't really have to look ahead (as long as the data is sorted by id and dispense_date).
But you might want to remerge the final result to get the whole overlapping period onto each observation in the group.
data want;
set have ;
by patient_id dispense_date;
retain first_date last_date group;
if first.patient_id then call missing(first_date,last_date,group);
if dispense_date > last_date then do;
group+1;
first_date=dispense_date; last_date=end_date;
end;
else do;
first_date=min(first_date,dispense_date);
last_date=max(last_date,end_date);
end;
format first_date last_date yymmdd10.;
run;
data want;
do _n_=1 by 1 until(last.group);
merge want groups;
by patient_id group;
end;
do _n_=1 to _n_;
set want(drop=first_date last_date);
output;
end;
run;
Just to point you in a good direction, here is what I would suggest.
Treat this as a two-step problem. First assign a new variable GROUPNO based on whether the current observation belongs in the same group as a previous observation or not. Then group them. For example, start along these lines:
data want;
set have;
by patient_id;
if firstpatient_id then groupno=1;
prior_end = lag(end_date);
if ... whatever conditions you would like to indicate a new group should start ...
then groupno + 1;
run;
This step should give you a result aong the lines of:
PATIENT_ID RX_ID DISPENSE_DATE END_DATE GROUP_NO
1 A 1/1/2022 1/8/2022 1
1 B 1/7/2022 1/9/2022 1
1 C 1/8/2022 1/11/2022 1
1 D 1/19/2022 1/25/2022 2
1 E 1/26/2022 1/30/2022 3
From there, calculating the maxium END_DATE for each PATIENT_ID / GROUP_NO should be straightforward.
I appreciate the reply. To clarify, I am not trying to calculate the maximum end_date for each PATIENT_ID / GROUP_NO. I am trying to determine the number of days of overlap within GROUP_NO's, then add that number to the end of the the maximum end_date for each PATIENT/GROUP_NO. Part of my difficulty is determining the number of dates of overlap.
Isn't the number of days of overlap just the difference between the sum of each individual record's duration and the duration of the overall period? Which you can calculate if you know the first and last date for the period.
data have;
input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmdd. END_DATE :yymmdd. GROUP_NO ;
format DISPENSE_DATE END_DATE yymmdd10.;
cards;
1 A 2022-01-01 2022-01-08 1
1 B 2022-01-07 2022-01-09 1
1 C 2022-01-08 2022-01-11 1
1 D 2022-01-19 2022-01-25 2
1 E 2022-01-26 2022-01-30 3
;
data want;
set have ;
by patient_id group_no;
days_supp = end_date - dispense_date + 1;
retain first_date last_date;
if first.group_no then first_date = dispense_date ;
if first.group_no then total_supp = days_supp;
else total_supp + days_supp;
last_date=max(last_date,end_date);
total_days = end_date - first_date +1;
overlap = total_supp - total_days;
format first_date last_date yymmdd10.;
run;
proc print;
run;
Result
Thanks Tom. You've pointed me in the right direction for this problem.
One remaining issue is the creation of the groups. In the example data, Rx B overlaps with Rx A and Rx C overlaps with Rx B, so you can just compare the current observation to the previous one to group prescriptions together.
data have;
input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmdd. END_DATE :yymmdd. GROUP_NO ;
format DISPENSE_DATE END_DATE yymmdd10.;
cards;
1 A 2022-01-01 2022-01-08 1
1 B 2022-01-07 2022-01-09 1
1 C 2022-01-08 2022-01-11 1
1 D 2022-01-19 2022-01-25 2
1 E 2022-01-26 2022-01-30 3
;
But imagine you instead had data like like this, in which C doesn't overlap with A but C does overlap with B. We still want to group C together with A and B, but you can't do that by comparing the current observation to the previous observation.
data have;
input PATIENT_ID RX_ID $ DISPENSE_DATE :yymmd. END_DATE :yymmdd.;
format DISPENSE_DATE END_DATE yymmdd10.;
cards;
1 A 2022-01-01 2022-01-08
1 A 2022-01-04 2022-01-06
1 C 2022-01-07 2022-01-11
1 D 2022-01-19 2022-01-25
1 E 2022-01-26 2022-01-30
;
run;
You don't really have to look ahead (as long as the data is sorted by id and dispense_date).
But you might want to remerge the final result to get the whole overlapping period onto each observation in the group.
data want;
set have ;
by patient_id dispense_date;
retain first_date last_date group;
if first.patient_id then call missing(first_date,last_date,group);
if dispense_date > last_date then do;
group+1;
first_date=dispense_date; last_date=end_date;
end;
else do;
first_date=min(first_date,dispense_date);
last_date=max(last_date,end_date);
end;
format first_date last_date yymmdd10.;
run;
data want;
do _n_=1 by 1 until(last.group);
merge want groups;
by patient_id group;
end;
do _n_=1 to _n_;
set want(drop=first_date last_date);
output;
end;
run;
Tom, thanks for your reply. I was able to figure it out using your code for grouping prescriptions together.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.