How to delete duplicate observations by another variable

mherrick · Posted 11-01-2022 09:16 AM

Hello,

I am trying to figure out how to delete duplicate observations in my data set based on another variable.

I have multiple 'studyID's' per participant but I only want to keep the most recent one based on the 'caredate'. Here is my code but my results are still sowing multiple obs per ID if the 'caredate' as different. Please advise.

proc sort data= PPD;
by CAREDATE;
Run;

proc sort data= PPD
Dupout=PPD_Dupobs
Nodupkey;
by STUDYID;
Run;

PROC PRINT DATA=PPD (Obs=100);

Thank you for your help!

PeterClemmensen · Posted 11-01-2022 09:21 AM

Here is a classic way to do this with Proc SQL

data have;
input studyID $ caredate :date9.;
format caredate date9.;
datalines;
1 01jan2014 
1 01jan2015 
1 01jan2016 
2 01jan2017 
2 01jan2018 
2 01jan2019 
3 01jan2020 
3 01jan2021 
3 01jan2022 
;

proc sql;
   create table want as
   select * from have
   group by studyID
   having max(caredate) = caredate
   ;
quit;

Result:

studyID caredate
1       01JAN2016
2       01JAN2019
3       01JAN2022

The DATA to DATA Step Macro
Blog: SASnrd

LinusH · Posted 11-01-2022 09:22 AM

If your input dataset is unsorted, first sort BY StudyID and CareDate (descending).

Then another sort, with nodupkey with only BY StudyID.

Data never sleeps

PeterClemmensen · Posted 11-01-2022 09:23 AM

Or you can do this if you want to make sure that you only have 1 obs per studyID, regardless of ties.

proc sort data = have;
   by studyID caredate;
run;

data want;
   set have;
   by studyID caredate;
   if last.studyID;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Quentin · Posted 11-01-2022 09:53 AM

I'm confused. You mention 'per participant' but your code doesn't reference a ParticipantID.

Does your data have ParticipantID, StudyID, CareDate?

Can you show, say, 10 sample records of the data you have (with duplicates) and the data you want after de-duplicating?

It's not clear to me if you want the output to be one record per StudyID or one record per ParticipantID.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Catch up on SAS Innovate 2026

How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Re: How to delete duplicate observations by another variable

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away