BookmarkSubscribeRSS Feed
mherrick
Calcite | Level 5

Hello,

I am trying to figure out how to delete duplicate observations in my data set based on another variable.

I have multiple 'studyID's' per participant but I only want to keep the most recent one based on the 'caredate'.  Here is my code but my results are still sowing multiple obs per ID if the 'caredate' as different.  Please advise.

 

proc sort data= PPD;
by CAREDATE;
Run;

proc sort data= PPD
Dupout=PPD_Dupobs
Nodupkey;
by STUDYID;
Run;

PROC PRINT DATA=PPD (Obs=100);

 

Thank you for your help!

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Here is a classic way to do this with Proc SQL

 

data have;
input studyID $ caredate :date9.;
format caredate date9.;
datalines;
1 01jan2014 
1 01jan2015 
1 01jan2016 
2 01jan2017 
2 01jan2018 
2 01jan2019 
3 01jan2020 
3 01jan2021 
3 01jan2022 
;

proc sql;
   create table want as
   select * from have
   group by studyID
   having max(caredate) = caredate
   ;
quit;

 

Result:

 

studyID caredate
1       01JAN2016
2       01JAN2019
3       01JAN2022
LinusH
Tourmaline | Level 20

If your input dataset is unsorted, first sort BY StudyID and CareDate (descending).

Then another sort, with nodupkey with only BY StudyID.

Data never sleeps
PeterClemmensen
Tourmaline | Level 20

Or you can do this if you want to make sure that you only have 1 obs per studyID, regardless of ties.

 

proc sort data = have;
   by studyID caredate;
run;

data want;
   set have;
   by studyID caredate;
   if last.studyID;
run;
Quentin
Super User

I'm confused.  You mention 'per participant' but your code doesn't reference a ParticipantID.

 

Does your data have ParticipantID, StudyID, CareDate? 

 

Can you show, say, 10 sample records of the data you have (with duplicates) and the data you want after de-duplicating?

 

It's not clear to me if you want the output to be one record per StudyID or one record per ParticipantID.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 425 views
  • 0 likes
  • 4 in conversation