BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
uahmed97
Calcite | Level 5

This is my first post, so apologies if this question is not worded/formatted correctly.

 

I am trying to sort a dataset that has the following rough structure:

First NameLast NameDate of BirthUnique ID
JohnSmith01/01/19701000
JohnSmith01/01/19702000
JaneSmith01/03/19753000
JaneSmith06/08/19804000

 

The rough operation I want to carry out is to de-duplicate the first name, last name and dob (with all three representing a single person), and then keep the biggest number for unique ID.

 

What I want to achieve is this:

First NameLast NameDate of BirthUnique ID
JohnSmith01/01/19702000
JaneSmith01/03/19753000
JaneSmith06/08/19804000

 

My initial thought was to try something like this:

DATA people_sorted;
    SET people;
PROC SORT DATA=people_sorted;
    BY id;
RUN;
PROC SORT DATA=people_sorted NODUPKEY;
    BY fname lname dob;
RUN;

This does not retain the original sorting as I had initially thought.

I was wondering if there was a way to 'chain' these sorts together, so to speak. Alternatively, is there a better approach to this that I'm not seeing?

 

Thank you in advance for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

No 🙂

 

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;

 

Result:

 

FirstName LastName DateofBirth UniqueID 
Jane      Smith    01/01/1970  2000 
John      Smith    01/01/1970  1000 

 

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
John Smith 01/01/1970 2000
Jane Smith 01/03/1975 3000
Jane Smith 06/08/1980 4000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;
uahmed97
Calcite | Level 5

@PeterClemmensen 

 

Forgive me if I misunderstand, but would this not ignore if two people happen to have the same date of birth?

 

For example, if the data was:

John Smith 01/01/1970 1000

Jane Doe 01/01/1970 2000

 

Your solution would keep only Jane Doe, instead of keeping both people?

PeterClemmensen
Tourmaline | Level 20

No 🙂

 

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;

 

Result:

 

FirstName LastName DateofBirth UniqueID 
Jane      Smith    01/01/1970  2000 
John      Smith    01/01/1970  1000 

 

uahmed97
Calcite | Level 5

Thank you very much, it does indeed work! I think I need to do a little more reading on FIRST and LAST then, as I don't quite understand it yet.

 

Thank you again 😄

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 418 views
  • 1 like
  • 2 in conversation