BookmarkSubscribeRSS Feed
mherrick
Calcite | Level 5

Hello,

I am trying to figure out how to delete duplicate observations in my data set based on another variable.

I have multiple 'studyID's' per participant but I only want to keep the most recent one based on the 'caredate'.  Here is my code but my results are still sowing multiple obs per ID if the 'caredate' as different.  Please advise.

 

proc sort data= PPD;
by CAREDATE;
Run;

proc sort data= PPD
Dupout=PPD_Dupobs
Nodupkey;
by STUDYID;
Run;

PROC PRINT DATA=PPD (Obs=100);

 

Thank you for your help!

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Here is a classic way to do this with Proc SQL

 

data have;
input studyID $ caredate :date9.;
format caredate date9.;
datalines;
1 01jan2014 
1 01jan2015 
1 01jan2016 
2 01jan2017 
2 01jan2018 
2 01jan2019 
3 01jan2020 
3 01jan2021 
3 01jan2022 
;

proc sql;
   create table want as
   select * from have
   group by studyID
   having max(caredate) = caredate
   ;
quit;

 

Result:

 

studyID caredate
1       01JAN2016
2       01JAN2019
3       01JAN2022
LinusH
Tourmaline | Level 20

If your input dataset is unsorted, first sort BY StudyID and CareDate (descending).

Then another sort, with nodupkey with only BY StudyID.

Data never sleeps
PeterClemmensen
Tourmaline | Level 20

Or you can do this if you want to make sure that you only have 1 obs per studyID, regardless of ties.

 

proc sort data = have;
   by studyID caredate;
run;

data want;
   set have;
   by studyID caredate;
   if last.studyID;
run;
Quentin
Super User

I'm confused.  You mention 'per participant' but your code doesn't reference a ParticipantID.

 

Does your data have ParticipantID, StudyID, CareDate? 

 

Can you show, say, 10 sample records of the data you have (with duplicates) and the data you want after de-duplicating?

 

It's not clear to me if you want the output to be one record per StudyID or one record per ParticipantID.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 986 views
  • 0 likes
  • 4 in conversation