BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FLCrime
Fluorite | Level 6

I have the following Three variables:

 

Person   Registration   Registration_Date

 

I would like to remove any complete duplicates and any rows that contain the same Person and Registration, keeping the latest registration. Example Below

 

Person   Registration   Registration_Date

Pete             A                 2019

Marco          A                 1993

Sam             B                  2002

Sam             B                  2003

Sam             C                 1960

David            A                  2002

David            A                  2002

 

 

 

This should result in:

 

Person   Registration   Registration_Date

Pete             A                 2019

Marco          A                 1993

Sam             B                  2003

Sam             C                 1960

David            A                  2002

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
jebjur
SAS Employee

If you want to use a task in Eguide, then the Sort Data task can be used, but you may have to use the task twice. The first time you would sort by all 3 variables, but make sure the sort order for Registration_Date is set to 'Descending', so the most recent date is the first observation for each Person and Registration group.

 

Then in the 2nd Sort Data task (used on the previously sorted output data set from the 1st Sort Data task), you would only sort by Person and Registration, and in the Options section under 'Duplicate Records', select "Keep only the first record for each 'Sort by' group" This will remove any duplicate observations for Person and Registration.

View solution in original post

3 REPLIES 3
ed_sas_member
Meteorite | Level 14

Hi @FLCrime 

 

Please try this:

data have;
	input Person $  Registration $  Registration_Date;
	datalines;
Pete             A                 2019
Marco          A                 1993
Sam             B                  2002
Sam             B                  2003
Sam             C                 1960
David            A                  2002
David            A                  2002
 ;
 run;
 
proc sort data=have out=have_sorted;
	by Person Registration Registration_Date;
run;
data want;
	set have_sorted;
	by Person Registration Registration_Date;
	if first.Registration then output;
run;

Best,

FLCrime
Fluorite | Level 6

Thank you for the response. Is there a way to do this through the point and click? Or query builder perhaps? I am not very well versed in code and the data set has about 8 million rows.

jebjur
SAS Employee

If you want to use a task in Eguide, then the Sort Data task can be used, but you may have to use the task twice. The first time you would sort by all 3 variables, but make sure the sort order for Registration_Date is set to 'Descending', so the most recent date is the first observation for each Person and Registration group.

 

Then in the 2nd Sort Data task (used on the previously sorted output data set from the 1st Sort Data task), you would only sort by Person and Registration, and in the Options section under 'Duplicate Records', select "Keep only the first record for each 'Sort by' group" This will remove any duplicate observations for Person and Registration.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2159 views
  • 1 like
  • 3 in conversation