This is my first post, so apologies if this question is not worded/formatted correctly.
I am trying to sort a dataset that has the following rough structure:
| First Name | Last Name | Date of Birth | Unique ID |
| John | Smith | 01/01/1970 | 1000 |
| John | Smith | 01/01/1970 | 2000 |
| Jane | Smith | 01/03/1975 | 3000 |
| Jane | Smith | 06/08/1980 | 4000 |
The rough operation I want to carry out is to de-duplicate the first name, last name and dob (with all three representing a single person), and then keep the biggest number for unique ID.
What I want to achieve is this:
| First Name | Last Name | Date of Birth | Unique ID |
| John | Smith | 01/01/1970 | 2000 |
| Jane | Smith | 01/03/1975 | 3000 |
| Jane | Smith | 06/08/1980 | 4000 |
My initial thought was to try something like this:
DATA people_sorted;
SET people;
PROC SORT DATA=people_sorted;
BY id;
RUN;
PROC SORT DATA=people_sorted NODUPKEY;
BY fname lname dob;
RUN;This does not retain the original sorting as I had initially thought.
I was wondering if there was a way to 'chain' these sorts together, so to speak. Alternatively, is there a better approach to this that I'm not seeing?
Thank you in advance for your help!
No 🙂
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Result:
FirstName LastName DateofBirth UniqueID Jane Smith 01/01/1970 2000 John Smith 01/01/1970 1000
Try this
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
John Smith 01/01/1970 2000
Jane Smith 01/03/1975 3000
Jane Smith 06/08/1980 4000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Forgive me if I misunderstand, but would this not ignore if two people happen to have the same date of birth?
For example, if the data was:
John Smith 01/01/1970 1000
Jane Doe 01/01/1970 2000
Your solution would keep only Jane Doe, instead of keeping both people?
No 🙂
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Result:
FirstName LastName DateofBirth UniqueID Jane Smith 01/01/1970 2000 John Smith 01/01/1970 1000
Thank you very much, it does indeed work! I think I need to do a little more reading on FIRST and LAST then, as I don't quite understand it yet.
Thank you again 😄
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.