Hello All, In the attached excel sheet, I present data on two examples of two different patients. I have data on patient ID, medication name, and medication fill date. A person might be prescribed “B” medication, either at a pharmacy or at a clinic (in no particular order), or another medication entirely. For cases when “B” medication is prescribed, there is a risk of double counting if the medication is prescribed at both a pharmacy and clinic within a 14-day period. Therefore, I want to delete either a pharmacy fill or clinic fill of “B” medication if they occur within 14-days of one another. In such cases, we want to keep the first row/medication fill date of either the pharmacy or clinic prescription of the same medication (and delete what we have defined as duplicates). The risk of double counting occurs only when the same medication is prescribed at both the pharmacy and clinic within this time. If we see just pharmacy prescriptions or just clinic prescriptions, our logic does not hold.
What would be the most efficient way in either SAS or SQL to handle this? Thank you so much for any guidance that you might be able to provide!
Thank you for fixing the DATA steps. For the sample data your provided, this code will work:
data have(label='Prescription Data');
/*length id $7;*/
infile datalines truncover;
input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;
proc sort; by patient_id med_fill_date;
run;
data want (drop=_:);
set have ;
by patient_id med_fill_date;
retain _last_med_fill_date . _last_med_name ' ';
if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date; /*Subsetting IF*/
_last_med_fill_date=med_fill_date;
_last_med_name=med_name;
run;
The code above works correctly only if a patient has just one medication from multiple sites. If a given patient has multiple meds, then you will need to take the two-component variable med_name and divide it into two variables: the actual name of the medicine, and the site the prescription was filled, as below. I modified your sample dataset below to have a single instance of medicine A_.
data have(label='Prescription Data');
/*length id $7;*/
infile datalines truncover;
input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 A_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;
data need;
set have;
medicine_name=scan(med_name,1,'_');
medicine_site=scan(med_name,2,'_');
run;
proc sort;
by patient_id medicine_name med_fill_date;
run;
data want (drop=_:);
set need ;
by patient_id medicine_name med_fill_date;
retain _last_med_fill_date . _last_medicine_site ' ';
if first.medicine_name=1 or medicine_site=_last_medicine_site or med_fill_date-14>_last_med_fill_date; /*Subsetting IF*/
_last_med_fill_date=med_fill_date;
_last_medicine_site=medicine_site;
run;
I will not download Excel files and then rely on the vagaries of PROC IMPORT.
Please post usable example data as a data step with datalines, as shown here.
Here is what the data looks like:
Example 1:
Original data
patient_id | med_name | med_fill_date |
123 | B_pharmacy | 1/9/19 |
123 | B_clinic | 1/11/19 |
123 | B_clinic | 1/11/19 |
123 | B_clinic | 1/11/19 |
123 | Other | 1/17/19 |
123 | B_pharmacy | 2/7/19 |
123 | B_clinic | 2/8/19 |
What the data should look like:
patient_id | med_name | med_fill_date |
123 | B_pharmacy | 1/9/19 |
123 | B_pharmacy | 2/7/19 |
Example 2:
Original data:
patient_id | med_name | med_fill_date |
155 | B_pharmacy | 4/4/19 |
155 | B_clinic | 4/11/19 |
155 | B_pharmacy | 4/29/19 |
155 | B_pharmacy | 5/16/19 |
155 | B_clinic | 5/16/19 |
155 | B_pharmacy | 7/8/19 |
155 | B_pharmacy | 9/3/19 |
155 | B_clinic | 9/6/19 |
What data should look like:
patient_id | med_name | med_fill_date |
155 | B_pharmacy | 4/4/19 |
155 | B_pharmacy | 4/29/19 |
155 | B_pharmacy | 5/16/19 |
155 | B_pharmacy | 7/8/19 |
155 | B_pharmacy | 9/3/19 |
Above data as data lines:
Original data in Example 1:
data have(label='Prescription Data');
length id $7;
infile datalines truncover;
input patient_id:4 med_name:$10. med_fill_date:MMDDYY8. ;
Datalines;
123 B_pharmacy 1/9/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_clinic 1/17/19
123 Other 1/11/19
123 B_pharmacy 2/7/19
123 B_clinic 2/8/19
;;;;
proc sort;
by id;
run;
Original data in Example 2:
data have(label='Prescription Data');
length id $7;
infile datalines truncover;
input patient_id:4 med_name:$10. med_fill_date:MMDDYY8. ;
Datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
;;;;
proc sort;
by id;
run;
Thank you for providing your sample data in the form of a data step.
Now please make sure the data step actually works. The code provided does not read in any of the variables correctly.
But, in addition to the fact that the given DATA step generates errors, which I guess we could correct, I have these questions:
See below for corrected datalines.
Example 1, Original data
data have(label='Prescription Data');
/*length id $7;*/
infile datalines truncover;
input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
format med_fill_date: mmddyy8.;
Datalines;
123 B_pharmacy 1/9/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_other 1/17/19
123 B_pharmacy 2/7/19
123 B_clinic 2/8/19
;
proc sort;
by patient_id;
run;
What the data should look like for Example 1:
patient_id | med_name | med_fill_date |
123 | B_pharmacy | 1/9/19 |
123 | B_pharmacy | 2/7/19 |
Example 2, Original data
data have(label='Prescription Data');
/*length id $7;*/
infile datalines truncover;
input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
format med_fill_date: mmddyy8.;
Datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
;
proc sort;
by patient_id;
run;
What the data should look like for Example 2:
patient_id | med_name | med_fill_date |
155 | B_pharmacy | 4/4/19 |
155 | B_pharmacy | 4/29/19 |
155 | B_pharmacy | 5/16/19 |
155 | B_pharmacy | 7/8/19 |
155 | B_pharmacy | 9/3/19 |
To answer your questions @mkeintz :
A person might be prescribed “B” medication, either at a pharmacy or at a clinic (in no particular order), or another medication entirely. For cases when “B” medication is prescribed, there is a risk of double counting if the medication is prescribed at both a pharmacy and clinic within a 14-day period. Therefore, I want to delete either a pharmacy fill or clinic fill of “B” medication if they occur within 14-days of one another. In such cases, we want to keep the first row/medication fill date of either the pharmacy or clinic prescription of the same medication (and delete what we have defined as duplicates). The risk of double counting occurs only when the same medication is prescribed at both the pharmacy and clinic within this time. If we see just pharmacy prescriptions or just clinic prescriptions, this requirement does not hold.
Thank you for fixing the DATA steps. For the sample data your provided, this code will work:
data have(label='Prescription Data');
/*length id $7;*/
infile datalines truncover;
input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;
proc sort; by patient_id med_fill_date;
run;
data want (drop=_:);
set have ;
by patient_id med_fill_date;
retain _last_med_fill_date . _last_med_name ' ';
if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date; /*Subsetting IF*/
_last_med_fill_date=med_fill_date;
_last_med_name=med_name;
run;
The code above works correctly only if a patient has just one medication from multiple sites. If a given patient has multiple meds, then you will need to take the two-component variable med_name and divide it into two variables: the actual name of the medicine, and the site the prescription was filled, as below. I modified your sample dataset below to have a single instance of medicine A_.
data have(label='Prescription Data');
/*length id $7;*/
infile datalines truncover;
input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 A_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;
data need;
set have;
medicine_name=scan(med_name,1,'_');
medicine_site=scan(med_name,2,'_');
run;
proc sort;
by patient_id medicine_name med_fill_date;
run;
data want (drop=_:);
set need ;
by patient_id medicine_name med_fill_date;
retain _last_med_fill_date . _last_medicine_site ' ';
if first.medicine_name=1 or medicine_site=_last_medicine_site or med_fill_date-14>_last_med_fill_date; /*Subsetting IF*/
_last_med_fill_date=med_fill_date;
_last_medicine_site=medicine_site;
run;
Thank you!! Very helpful to see how you coded this. I had to make one slight tweak in the following:
if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date;
where the first "or" should be "and" but otherwise, it worked well.
Thanks again
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.