BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alaxman
Obsidian | Level 7

Hello All, In the attached excel sheet, I present data on two examples of two different patients. I have data on patient ID, medication name, and medication fill date. A person might be prescribed “B” medication, either at a pharmacy or at a clinic (in no particular order), or another medication entirely. For cases when “B” medication is prescribed, there is a risk of double counting if the medication is prescribed at both a pharmacy and clinic within a 14-day period. Therefore, I want to delete either a pharmacy fill or clinic fill of “B” medication if they occur within 14-days of one another. In such cases, we want to keep the first row/medication fill date of either the pharmacy or clinic prescription of the same medication (and delete what we have defined as duplicates). The risk of double counting occurs only when the same medication is prescribed at both the pharmacy and clinic within this time. If we see just pharmacy prescriptions or just clinic prescriptions, our logic does not hold.

 

What would be the most efficient way in either SAS or SQL to handle this?  Thank you so much for any guidance that you might be able to provide!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Thank you for fixing the DATA steps.  For the sample data your provided, this code will work:

 

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

proc sort; by patient_id med_fill_date;
run;

data want (drop=_:);
  set have ;
  by patient_id med_fill_date;
  retain _last_med_fill_date .  _last_med_name '          ';
  if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_med_name=med_name;
run;

The code above works correctly only if a patient has just one medication from multiple sites.  If a given patient has multiple meds, then you will need to take the two-component variable med_name and divide it into two variables: the actual name of the medicine, and the site the prescription was filled, as below.  I modified your sample dataset below to have a single instance of medicine A_.

 

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 A_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

data need;
  set have;
  medicine_name=scan(med_name,1,'_');
  medicine_site=scan(med_name,2,'_');
run;

proc sort;
  by patient_id medicine_name med_fill_date;
run;

data want (drop=_:);
  set need ;
  by patient_id medicine_name med_fill_date;
  retain _last_med_fill_date .  _last_medicine_site '          ';
  if first.medicine_name=1 or medicine_site=_last_medicine_site or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_medicine_site=medicine_site;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

I will not download Excel files and then rely on the vagaries of PROC IMPORT.

Please post usable example data as a data step with datalines, as shown here.

alaxman
Obsidian | Level 7

Here is what the data looks like:

Example 1:
Original data

patient_id med_name med_fill_date
123 B_pharmacy 1/9/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 Other 1/17/19
123 B_pharmacy 2/7/19
123 B_clinic 2/8/19

 

What the data should look like:

patient_id med_name med_fill_date
123 B_pharmacy 1/9/19
123 B_pharmacy 2/7/19

 

Example 2:

Original data:

patient_id med_name med_fill_date
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19

 

What data should look like:

patient_id med_name med_fill_date
155 B_pharmacy 4/4/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19

 

Above data as data lines:

Original data in Example 1:

 

 

data have(label='Prescription Data');
   length id $7;
   infile datalines  truncover;
   input patient_id:4 med_name:$10. med_fill_date:MMDDYY8. ;
Datalines;
123 B_pharmacy 1/9/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_clinic 1/17/19
123 Other 1/11/19
123 B_pharmacy 2/7/19
123 B_clinic 2/8/19
;;;;
proc sort;
 by id;
run;

 

 

Original data in Example 2:

 

 

data have(label='Prescription Data');
   length id $7;
   infile datalines  truncover;
   input patient_id:4 med_name:$10. med_fill_date:MMDDYY8. ;
Datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
;;;;
proc sort;
 by id;
run;

 

 

 

 

mkeintz
PROC Star

Thank you for providing your sample data in the form of a data step.

 

Now please make sure the data step actually works.   The code provided does not read in any of the variables correctly.

 

But, in addition to the fact that the given DATA step generates errors, which I guess we could correct, I have these questions:

 

  1. Do you actually have a single variable (med_name in your example) that contains both the medicine name and the site where the prescription is filled?  Very poor data design if so, and it makes performing the specific task you request messier that it should be.

  2. What if the first prescription is followed by a second prescription for the same medication and filled at the same site?  Is that also subject to the 14-day minimum gap requirement, the same as having different sites?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
alaxman
Obsidian | Level 7

@mkeintz 

@Kurt_Bremser 

 

See below for corrected datalines. 

 

Example 1, Original data

 

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
Datalines;
123 B_pharmacy 1/9/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_other 1/17/19
123 B_pharmacy 2/7/19
123 B_clinic 2/8/19
;
proc sort;
 by patient_id;
run;

 

What the data should look like for Example 1: 

patient_id med_name med_fill_date
123 B_pharmacy 1/9/19
123 B_pharmacy 2/7/19

 

Example 2, Original data

 

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
Datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
;
proc sort;
 by patient_id;
run;

 

 

What the data should look like for Example 2: 

 

patient_id med_name med_fill_date
155 B_pharmacy 4/4/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19

 

To answer your questions @mkeintz :

 

  1. Do you actually have a single variable (med_name in your example) that contains both the medicine name and the site where the prescription is filled?  Very poor data design if so, and it makes performing the specific task you request messier that it should be. We have a single variable that has just the medication name, which we are able to determine whether it's from the pharmacy or clinic; so yes, it is encapsulated into one variable.
  2. What if the first prescription is followed by a second prescription for the same medication and filled at the same site?  Is that also subject to the 14-day minimum gap requirement, the same as having different sites? If it's filled at the same site,  it is not subject to the 14-day minimum gap requirement. Here is my original description of the problem: 

    A person might be prescribed “B” medication, either at a pharmacy or at a clinic (in no particular order), or another medication entirely. For cases when “B” medication is prescribed, there is a risk of double counting if the medication is prescribed at both a pharmacy and clinic within a 14-day period. Therefore, I want to delete either a pharmacy fill or clinic fill of “B” medication if they occur within 14-days of one another. In such cases, we want to keep the first row/medication fill date of either the pharmacy or clinic prescription of the same medication (and delete what we have defined as duplicates). The risk of double counting occurs only when the same medication is prescribed at both the pharmacy and clinic within this time. If we see just pharmacy prescriptions or just clinic prescriptions, this requirement does not hold.

mkeintz
PROC Star

Thank you for fixing the DATA steps.  For the sample data your provided, this code will work:

 

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

proc sort; by patient_id med_fill_date;
run;

data want (drop=_:);
  set have ;
  by patient_id med_fill_date;
  retain _last_med_fill_date .  _last_med_name '          ';
  if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_med_name=med_name;
run;

The code above works correctly only if a patient has just one medication from multiple sites.  If a given patient has multiple meds, then you will need to take the two-component variable med_name and divide it into two variables: the actual name of the medicine, and the site the prescription was filled, as below.  I modified your sample dataset below to have a single instance of medicine A_.

 

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 A_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

data need;
  set have;
  medicine_name=scan(med_name,1,'_');
  medicine_site=scan(med_name,2,'_');
run;

proc sort;
  by patient_id medicine_name med_fill_date;
run;

data want (drop=_:);
  set need ;
  by patient_id medicine_name med_fill_date;
  retain _last_med_fill_date .  _last_medicine_site '          ';
  if first.medicine_name=1 or medicine_site=_last_medicine_site or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_medicine_site=medicine_site;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
alaxman
Obsidian | Level 7

@mkeintz 

Thank you!! Very helpful to see how you coded this. I had to make one slight tweak in the following:

if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date;

 where the first "or" should be "and" but otherwise, it worked well. 

 

Thanks again

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1111 views
  • 0 likes
  • 3 in conversation