Deleting rows from a dataset that have the same value in a particular ...

alaxman · Posted 09-16-2021 10:27 PM

Hello,

I'm having trouble figuring out how to delete complete rows from a dataset when the first prescription date for the patient for more than one drug is the same. Here is an example dataset:

data WORK.CLASS(label='Prescription Data');
   infile datalines dsd truncover;
   input Patient_ID $2  Prescription_Date:MMDDYY8. Drug:$1 ;
 Datalines;
XX 01/01/21  A
XX 01/01/21 B
XX 03/15/21  A
XX 04/15/21  C
YY 01/01/21  A
YY 02/10/21 B
YY 03/15/21  A
YY 04/15/21  C
;;;;

The table looks something like this:

Patient_ID	Prescription Date	Drug
XX	01/01/21	A
XX	01/01/21	B
XX	03/15/21	B
XX	04/15/21	B
YY	01/01/21	B
YY	02/10/21	B
YY	04/15/21	A
YY	04/15/21	B

Now for patient XX - they've been prescribed drugs A and B on the exact same date the first time they've ever been prescribed a drug and hence I want all 4 rows for patient XX deleted.

However, for patient YY, there is only one 'first prescription date' which is 01/01/21. They do have a subsequent prescription for drug A and B on 04/15/21. I don't want any rows for patient YY deleted because on their 'first prescription date', they only had one drug prescribed.

Can someone help?

Reeza · Posted 09-16-2021 11:17 PM

Probably a much easier way but it's late, my suggestion would be a double DoW loop but they can be complicated to code or understand. You may also want to just use multiple steps instead if that's easier.

1. Find the number unique of drugs per day

2. Identify the first day and if it has more than 2 drugs write that ID to a table

3. Filter based on ID's in table2

proc sql;
create table drugsPerDay as
select patient_id, prescription_date, count(distinct drug) as num_drugs
from have
group by patient_id, prescription_date;
quit;

data singleDrugs;
set drugsPerDay;
by patient_ID prescription_date;
if first.patient_id and num_drugs = 1;
run;

data want_data;
merge have (in=t1) singleDrugs (in=t2);
by patient_id;
if t2;
run;

Or squishing it all together in SQL (SQL is the easiest method to get distinct counts)

proc sql;

create table want as
select * from have where patient_id NOT in (
select patient_id from(
select patient_id, prescription_date, count(distinct drug) as num_drugs
from have
group by patient_id, prescription_date)
where num_drugs > 1
group by patient_id

having Prescription_Date=min(Prescription_Date));

quit;

@alaxman wrote:

Hello,

I'm having trouble figuring out how to delete complete rows from a dataset when the first prescription date for the patient for more than one drug is the same. Here is an example dataset:
data WORK.CLASS(label='Prescription Data');
   infile datalines dsd truncover;
   input Patient_ID $2  Prescription_Date:MMDDYY8. Drug:$1 ;
 Datalines;
XX 01/01/21  A
XX 01/01/21 B
XX 03/15/21  A
XX 04/15/21  C
YY 01/01/21  A
YY 02/10/21 B
YY 03/15/21  A
YY 04/15/21  C
;;;;
The table looks something like this:

Patient_ID Prescription Date Drug

XX
01/01/21 A

XX 01/01/21 B

XX 03/15/21 B

XX 04/15/21 B

YY 01/01/21 B

YY 02/10/21 B

YY 04/15/21 A

YY 04/15/21 B

Now for patient XX - they've been prescribed drugs A and B on the exact same date the first time they've ever been prescribed a drug and hence I want all 4 rows for patient XX deleted.

However, for patient YY, there is only one 'first prescription date' which is 01/01/21. They do have a subsequent prescription for drug A and B on 04/15/21. I don't want any rows for patient YY deleted because on their 'first prescription date', they only had one drug prescribed.

Can someone help?

ChrisNZ · Posted 09-17-2021 02:57 AM

1. The code you provided does not run. Please always vet it by pasting back to SAS and running it.

2. Here is step by step, easy to follow (if not at all efficient) method:

proc sql;
  create table MIN as 
  select PATIENT_ID, min(PRESCRIPTION_DATE) as MIN 
  from CLASS 
  group by 1 ;

  create table COUNT as 
  select CLASS.PATIENT_ID, count(distinct DRUG) as COUNT
  from CLASS, MIN  
  where CLASS.PATIENT_ID=MIN.PATIENT_ID and PRESCRIPTION_DATE=MIN
  group by 1 ;

  delete from CLASS where PATIENT_ID in (select PATIENT_ID from COUNT where COUNT>1) ;
quit;

High-Performance SAS Coding - Third Edition

Deleting rows from a dataset that have the same value in a particular column

Re: Deleting rows from a dataset that have the same value in a particular column

Re: Deleting rows from a dataset that have the same value in a particular column

Deleting rows from a dataset that have the same value in a particular column

Re: Deleting rows from a dataset that have the same value in a particular column

Re: Deleting rows from a dataset that have the same value in a particular column

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away