Howdy,
I've spent days trying to figure out the solution to this problem I've been having. I've tried to employ pretty much all of the solutions to similar questions I've found online to my code, but none of them give me what I need! To be clear, I'm not looking for PDC or MPR. I'm simply looking to find the mean number of days between patients' refills!
I need to calculate the mean (SD) number of days between each patient's pharmacy claims for insulin. I'm using a massive commercial claims dataset. Say a patient has an index fill on 2/2/21 and refills their script on 5/1/21 and 7/1/21. Another patient fills an initial script on 3/4/21 and six refills thereafter... and on and on. The number of refills differs for each patient, as do the number of days supplied and dates of fills and refills. Our goal is to assess the temporal component of insulin filling. I.e., do patients do all of their filling in the first three months of the year? Is it staggered? If yes, how so? Etc. I'd ideally like to be able to correct for overlap and gaps between fills and prescription coverage, respectively.
The data structure is something like this:
data temp;
input patient_id svc_dt days_supply_cnt;
format svc_dt mmddyy10.;
datalines;
1 10/21/19 60
1 11/10/19 15
1 1/30/20 30
2 5/1/19 20
2 10/4/19 10
3 1/1/19 70
3 4/11/19 4
3 12/2/19 100
3 5/19/20 10
3 10/1/20 100
4 10/29/19 100
4 10/29/19 100
;
run;
My supervisor believes one can simply sum the number days between each fill and divide that by the total number of claims the patient makes during the observation period, but I'm having trouble even getting that. Thanks so much for your help!
PROC MEANS will do this easily, once you compute the actual number of days between refills for each patient.
So for patient 1, you have to compute the number of days between 10/21/19 and 11/10/19. (Let's say that's 21 days). And then you have to compute the number of days between 11/10/19 and 1/30/20, let's say that is 80 days. So the mean is then (80+21)/2. But that's really (max date – min date)/(purchases-1)
So
data temp;
input patient_id svc_dt days_supply_cnt;
format svc_dt mmddyy10.;
datalines;
1 10/21/19 60
1 11/10/19 15
1 1/30/20 30
2 5/1/19 20
2 10/4/19 10
3 1/1/19 70
3 4/11/19 4
3 12/2/19 100
3 5/19/20 10
3 10/1/20 100
4 10/29/19 100
4 10/29/19 100
;
proc means data=temp nway noprint;
class patient_id;
var svc_dt;
output out=want min=min_svc_dt max=max_svc_dt n=n_purchases;
run;
Then you can take the results in data set WANT and do the final calculation in another DATA step:
(max date – min date)/(n_purchases-1)
PROC MEANS will do this easily, once you compute the actual number of days between refills for each patient.
So for patient 1, you have to compute the number of days between 10/21/19 and 11/10/19. (Let's say that's 21 days). And then you have to compute the number of days between 11/10/19 and 1/30/20, let's say that is 80 days. So the mean is then (80+21)/2. But that's really (max date – min date)/(purchases-1)
So
data temp;
input patient_id svc_dt days_supply_cnt;
format svc_dt mmddyy10.;
datalines;
1 10/21/19 60
1 11/10/19 15
1 1/30/20 30
2 5/1/19 20
2 10/4/19 10
3 1/1/19 70
3 4/11/19 4
3 12/2/19 100
3 5/19/20 10
3 10/1/20 100
4 10/29/19 100
4 10/29/19 100
;
proc means data=temp nway noprint;
class patient_id;
var svc_dt;
output out=want min=min_svc_dt max=max_svc_dt n=n_purchases;
run;
Then you can take the results in data set WANT and do the final calculation in another DATA step:
(max date – min date)/(n_purchases-1)
Thanks, Paige, but how do I do this for a dataset with nearly 100,000 patients with >1 observation each? I.e., a class statement for patient_id is impossible.
No need to change anything. The code I gave works for 5 patients or 100000 patients. That is the power of PROC MEANS.
Honestly not sure how the mean and standard deviation data would answer your stated question.
Our goal is to assess the temporal component of insulin filling. I.e., do patients do all of their filling in the first three months of the year?
Some other things you may want to consider:
I would create a graph of the days prescription filled weighted by the amount of drugs but it also depends on what you're looking to solve - a drug supply problem or looking at when your insurance claims might peak. May want to weight by costs and/or # of days supply.
title 'Monthly demand for insulin drugs';
proc freq data=temp;
table svc_dt;
weight days_supply_cnt;
format svc_dt yymmn6.;
run;
title 'Quarterly demand for insulin drugs';
proc freq data=temp;
table svc_dt;
weight days_supply_cnt;
format svc_dt yyq.;
run;
My nickel of thought as an insurance analytics manager.
Main concern I would have with this approach would be cases were for some reason the "interval" really does not reflect some sort of "continuity". Since you do not have anything related to diagnosis, diagnosis date or even the drug(s) involved what happens if your patient has not had anything prescribed for a while because of changes in medical status (no longer prescribed for some reason)? Or patient just stopped coming in for medications for some reason?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.