Solved: Re: Medication fill code conditions

alaxman · Posted 08-06-2022 02:52 PM

Hello All, In the attached excel sheet, I present data on two examples of two different patients. I have data on patient ID, medication name, and medication fill date. A person might be prescribed “B” medication, either at a pharmacy or at a clinic (in no particular order), or another medication entirely. For cases when “B” medication is prescribed, there is a risk of double counting if the medication is prescribed at both a pharmacy and clinic within a 14-day period. Therefore, I want to delete either a pharmacy fill or clinic fill of “B” medication if they occur within 14-days of one another. In such cases, we want to keep the first row/medication fill date of either the pharmacy or clinic prescription of the same medication (and delete what we have defined as duplicates). The risk of double counting occurs only when the same medication is prescribed at both the pharmacy and clinic within this time. If we see just pharmacy prescriptions or just clinic prescriptions, our logic does not hold.

What would be the most efficient way in either SAS or SQL to handle this? Thank you so much for any guidance that you might be able to provide!

mkeintz · Posted 08-07-2022 12:57 AM

Thank you for fixing the DATA steps. For the sample data your provided, this code will work:

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

proc sort; by patient_id med_fill_date;
run;

data want (drop=_:);
  set have ;
  by patient_id med_fill_date;
  retain _last_med_fill_date .  _last_med_name '          ';
  if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_med_name=med_name;
run;

The code above works correctly only if a patient has just one medication from multiple sites. If a given patient has multiple meds, then you will need to take the two-component variable med_name and divide it into two variables: the actual name of the medicine, and the site the prescription was filled, as below. I modified your sample dataset below to have a single instance of medicine A_.

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 A_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

data need;
  set have;
  medicine_name=scan(med_name,1,'_');
  medicine_site=scan(med_name,2,'_');
run;

proc sort;
  by patient_id medicine_name med_fill_date;
run;

data want (drop=_:);
  set need ;
  by patient_id medicine_name med_fill_date;
  retain _last_med_fill_date .  _last_medicine_site '          ';
  if first.medicine_name=1 or medicine_site=_last_medicine_site or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_medicine_site=medicine_site;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Kurt_Bremser · Posted 08-06-2022 03:22 PM

I will not download Excel files and then rely on the vagaries of PROC IMPORT.

Please post usable example data as a data step with datalines, as shown here.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

alaxman · Posted 08-06-2022 05:03 PM

Here is what the data looks like:

Example 1:
Original data

patient_id	med_name	med_fill_date
123	B_pharmacy	1/9/19
123	B_clinic	1/11/19
123	B_clinic	1/11/19
123	B_clinic	1/11/19
123	Other	1/17/19
123	B_pharmacy	2/7/19
123	B_clinic	2/8/19

What the data should look like:

patient_id	med_name	med_fill_date
123	B_pharmacy	1/9/19
123	B_pharmacy	2/7/19

Example 2:

Original data:

patient_id	med_name	med_fill_date
155	B_pharmacy	4/4/19
155	B_clinic	4/11/19
155	B_pharmacy	4/29/19
155	B_pharmacy	5/16/19
155	B_clinic	5/16/19
155	B_pharmacy	7/8/19
155	B_pharmacy	9/3/19
155	B_clinic	9/6/19

What data should look like:

patient_id	med_name	med_fill_date
155	B_pharmacy	4/4/19
155	B_pharmacy	4/29/19
155	B_pharmacy	5/16/19
155	B_pharmacy	7/8/19
155	B_pharmacy	9/3/19

Above data as data lines:

Original data in Example 1:

data have(label='Prescription Data');
   length id $7;
   infile datalines  truncover;
   input patient_id:4 med_name:$10. med_fill_date:MMDDYY8. ;
Datalines;
123 B_pharmacy 1/9/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_clinic 1/17/19
123 Other 1/11/19
123 B_pharmacy 2/7/19
123 B_clinic 2/8/19
;;;;
proc sort;
 by id;
run;

Original data in Example 2:

data have(label='Prescription Data');
   length id $7;
   infile datalines  truncover;
   input patient_id:4 med_name:$10. med_fill_date:MMDDYY8. ;
Datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
;;;;
proc sort;
 by id;
run;

mkeintz · Posted 08-06-2022 07:05 PM

Thank you for providing your sample data in the form of a data step.

Now please make sure the data step actually works. The code provided does not read in any of the variables correctly.

But, in addition to the fact that the given DATA step generates errors, which I guess we could correct, I have these questions:

Do you actually have a single variable (med_name in your example) that contains both the medicine name and the site where the prescription is filled? Very poor data design if so, and it makes performing the specific task you request messier that it should be.
What if the first prescription is followed by a second prescription for the same medication and filled at the same site? Is that also subject to the 14-day minimum gap requirement, the same as having different sites?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

alaxman · Posted 08-06-2022 07:49 PM

@mkeintz

@Kurt_Bremser

See below for corrected datalines.

Example 1, Original data

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
Datalines;
123 B_pharmacy 1/9/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_clinic 1/11/19
123 B_other 1/17/19
123 B_pharmacy 2/7/19
123 B_clinic 2/8/19
;
proc sort;
 by patient_id;
run;

What the data should look like for Example 1:

patient_id	med_name	med_fill_date
123	B_pharmacy	1/9/19
123	B_pharmacy	2/7/19

Example 2, Original data

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
Datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
;
proc sort;
 by patient_id;
run;

What the data should look like for Example 2:

patient_id	med_name	med_fill_date
155	B_pharmacy	4/4/19
155	B_pharmacy	4/29/19
155	B_pharmacy	5/16/19
155	B_pharmacy	7/8/19
155	B_pharmacy	9/3/19

To answer your questions @mkeintz :

Do you actually have a single variable (med_name in your example) that contains both the medicine name and the site where the prescription is filled? Very poor data design if so, and it makes performing the specific task you request messier that it should be. We have a single variable that has just the medication name, which we are able to determine whether it's from the pharmacy or clinic; so yes, it is encapsulated into one variable.
What if the first prescription is followed by a second prescription for the same medication and filled at the same site? Is that also subject to the 14-day minimum gap requirement, the same as having different sites? If it's filled at the same site, it is not subject to the 14-day minimum gap requirement. Here is my original description of the problem:
A person might be prescribed “B” medication, either at a pharmacy or at a clinic (in no particular order), or another medication entirely. For cases when “B” medication is prescribed, there is a risk of double counting if the medication is prescribed at both a pharmacy and clinic within a 14-day period. Therefore, I want to delete either a pharmacy fill or clinic fill of “B” medication if they occur within 14-days of one another. In such cases, we want to keep the first row/medication fill date of either the pharmacy or clinic prescription of the same medication (and delete what we have defined as duplicates). The risk of double counting occurs only when the same medication is prescribed at both the pharmacy and clinic within this time. If we see just pharmacy prescriptions or just clinic prescriptions, this requirement does not hold.

mkeintz · Posted 08-07-2022 12:57 AM

Thank you for fixing the DATA steps. For the sample data your provided, this code will work:

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 B_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

proc sort; by patient_id med_fill_date;
run;

data want (drop=_:);
  set have ;
  by patient_id med_fill_date;
  retain _last_med_fill_date .  _last_med_name '          ';
  if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_med_name=med_name;
run;

The code above works correctly only if a patient has just one medication from multiple sites. If a given patient has multiple meds, then you will need to take the two-component variable med_name and divide it into two variables: the actual name of the medicine, and the site the prescription was filled, as below. I modified your sample dataset below to have a single instance of medicine A_.

data have(label='Prescription Data');
   /*length id $7;*/
   infile datalines truncover;
   input patient_id:$4. med_name:$10. med_fill_date:MMDDYY8.;
   format med_fill_date: mmddyy8.;
datalines;
155 B_pharmacy 4/4/19
155 A_clinic 4/11/19
155 B_pharmacy 4/29/19
155 B_pharmacy 5/16/19
155 B_clinic 5/16/19
155 B_pharmacy 7/8/19
155 B_pharmacy 9/3/19
155 B_clinic 9/6/19
run;

data need;
  set have;
  medicine_name=scan(med_name,1,'_');
  medicine_site=scan(med_name,2,'_');
run;

proc sort;
  by patient_id medicine_name med_fill_date;
run;

data want (drop=_:);
  set need ;
  by patient_id medicine_name med_fill_date;
  retain _last_med_fill_date .  _last_medicine_site '          ';
  if first.medicine_name=1 or medicine_site=_last_medicine_site or med_fill_date-14>_last_med_fill_date;  /*Subsetting IF*/
  _last_med_fill_date=med_fill_date;
  _last_medicine_site=medicine_site;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

alaxman · Posted 08-08-2022 12:56 AM

@mkeintz

Thank you!! Very helpful to see how you coded this. I had to make one slight tweak in the following:

if first.patient_id=1 or med_name=_last_med_name or med_fill_date-14>_last_med_fill_date;

where the first "or" should be "and" but otherwise, it worked well.

Thanks again

Classroom Training Available!