This is my first post, so apologies if this question is not worded/formatted correctly.
I am trying to sort a dataset that has the following rough structure:
First Name | Last Name | Date of Birth | Unique ID |
John | Smith | 01/01/1970 | 1000 |
John | Smith | 01/01/1970 | 2000 |
Jane | Smith | 01/03/1975 | 3000 |
Jane | Smith | 06/08/1980 | 4000 |
The rough operation I want to carry out is to de-duplicate the first name, last name and dob (with all three representing a single person), and then keep the biggest number for unique ID.
What I want to achieve is this:
First Name | Last Name | Date of Birth | Unique ID |
John | Smith | 01/01/1970 | 2000 |
Jane | Smith | 01/03/1975 | 3000 |
Jane | Smith | 06/08/1980 | 4000 |
My initial thought was to try something like this:
DATA people_sorted;
SET people;
PROC SORT DATA=people_sorted;
BY id;
RUN;
PROC SORT DATA=people_sorted NODUPKEY;
BY fname lname dob;
RUN;
This does not retain the original sorting as I had initially thought.
I was wondering if there was a way to 'chain' these sorts together, so to speak. Alternatively, is there a better approach to this that I'm not seeing?
Thank you in advance for your help!
No 🙂
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Result:
FirstName LastName DateofBirth UniqueID Jane Smith 01/01/1970 2000 John Smith 01/01/1970 1000
Try this
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
John Smith 01/01/1970 2000
Jane Smith 01/03/1975 3000
Jane Smith 06/08/1980 4000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Forgive me if I misunderstand, but would this not ignore if two people happen to have the same date of birth?
For example, if the data was:
John Smith 01/01/1970 1000
Jane Doe 01/01/1970 2000
Your solution would keep only Jane Doe, instead of keeping both people?
No 🙂
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Result:
FirstName LastName DateofBirth UniqueID Jane Smith 01/01/1970 2000 John Smith 01/01/1970 1000
Thank you very much, it does indeed work! I think I need to do a little more reading on FIRST and LAST then, as I don't quite understand it yet.
Thank you again 😄
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.