This is my first post, so apologies if this question is not worded/formatted correctly.
I am trying to sort a dataset that has the following rough structure:
First Name | Last Name | Date of Birth | Unique ID |
John | Smith | 01/01/1970 | 1000 |
John | Smith | 01/01/1970 | 2000 |
Jane | Smith | 01/03/1975 | 3000 |
Jane | Smith | 06/08/1980 | 4000 |
The rough operation I want to carry out is to de-duplicate the first name, last name and dob (with all three representing a single person), and then keep the biggest number for unique ID.
What I want to achieve is this:
First Name | Last Name | Date of Birth | Unique ID |
John | Smith | 01/01/1970 | 2000 |
Jane | Smith | 01/03/1975 | 3000 |
Jane | Smith | 06/08/1980 | 4000 |
My initial thought was to try something like this:
DATA people_sorted;
SET people;
PROC SORT DATA=people_sorted;
BY id;
RUN;
PROC SORT DATA=people_sorted NODUPKEY;
BY fname lname dob;
RUN;
This does not retain the original sorting as I had initially thought.
I was wondering if there was a way to 'chain' these sorts together, so to speak. Alternatively, is there a better approach to this that I'm not seeing?
Thank you in advance for your help!
No 🙂
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Result:
FirstName LastName DateofBirth UniqueID Jane Smith 01/01/1970 2000 John Smith 01/01/1970 1000
Try this
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
John Smith 01/01/1970 2000
Jane Smith 01/03/1975 3000
Jane Smith 06/08/1980 4000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Forgive me if I misunderstand, but would this not ignore if two people happen to have the same date of birth?
For example, if the data was:
John Smith 01/01/1970 1000
Jane Doe 01/01/1970 2000
Your solution would keep only Jane Doe, instead of keeping both people?
No 🙂
data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;
proc sort data = have;
by FirstName LastName DateofBirth UniqueID;
run;
data want;
set have;
by FirstName LastName DateofBirth UniqueID;
if last.DateofBirth;
run;
Result:
FirstName LastName DateofBirth UniqueID Jane Smith 01/01/1970 2000 John Smith 01/01/1970 1000
Thank you very much, it does indeed work! I think I need to do a little more reading on FIRST and LAST then, as I don't quite understand it yet.
Thank you again 😄
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: