I have the following Three variables:
Person Registration Registration_Date
I would like to remove any complete duplicates and any rows that contain the same Person and Registration, keeping the latest registration. Example Below
Person Registration Registration_Date
Pete A 2019
Marco A 1993
Sam B 2002
Sam B 2003
Sam C 1960
David A 2002
David A 2002
This should result in:
Person Registration Registration_Date
Pete A 2019
Marco A 1993
Sam B 2003
Sam C 1960
David A 2002
If you want to use a task in Eguide, then the Sort Data task can be used, but you may have to use the task twice. The first time you would sort by all 3 variables, but make sure the sort order for Registration_Date is set to 'Descending', so the most recent date is the first observation for each Person and Registration group.
Then in the 2nd Sort Data task (used on the previously sorted output data set from the 1st Sort Data task), you would only sort by Person and Registration, and in the Options section under 'Duplicate Records', select "Keep only the first record for each 'Sort by' group" This will remove any duplicate observations for Person and Registration.
Hi @FLCrime
Please try this:
data have;
input Person $ Registration $ Registration_Date;
datalines;
Pete A 2019
Marco A 1993
Sam B 2002
Sam B 2003
Sam C 1960
David A 2002
David A 2002
;
run;
proc sort data=have out=have_sorted;
by Person Registration Registration_Date;
run;
data want;
set have_sorted;
by Person Registration Registration_Date;
if first.Registration then output;
run;
Best,
Thank you for the response. Is there a way to do this through the point and click? Or query builder perhaps? I am not very well versed in code and the data set has about 8 million rows.
If you want to use a task in Eguide, then the Sort Data task can be used, but you may have to use the task twice. The first time you would sort by all 3 variables, but make sure the sort order for Registration_Date is set to 'Descending', so the most recent date is the first observation for each Person and Registration group.
Then in the 2nd Sort Data task (used on the previously sorted output data set from the 1st Sort Data task), you would only sort by Person and Registration, and in the Options section under 'Duplicate Records', select "Keep only the first record for each 'Sort by' group" This will remove any duplicate observations for Person and Registration.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.