BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
uahmed97
Calcite | Level 5

This is my first post, so apologies if this question is not worded/formatted correctly.

 

I am trying to sort a dataset that has the following rough structure:

First NameLast NameDate of BirthUnique ID
JohnSmith01/01/19701000
JohnSmith01/01/19702000
JaneSmith01/03/19753000
JaneSmith06/08/19804000

 

The rough operation I want to carry out is to de-duplicate the first name, last name and dob (with all three representing a single person), and then keep the biggest number for unique ID.

 

What I want to achieve is this:

First NameLast NameDate of BirthUnique ID
JohnSmith01/01/19702000
JaneSmith01/03/19753000
JaneSmith06/08/19804000

 

My initial thought was to try something like this:

DATA people_sorted;
    SET people;
PROC SORT DATA=people_sorted;
    BY id;
RUN;
PROC SORT DATA=people_sorted NODUPKEY;
    BY fname lname dob;
RUN;

This does not retain the original sorting as I had initially thought.

I was wondering if there was a way to 'chain' these sorts together, so to speak. Alternatively, is there a better approach to this that I'm not seeing?

 

Thank you in advance for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

No 🙂

 

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;

 

Result:

 

FirstName LastName DateofBirth UniqueID 
Jane      Smith    01/01/1970  2000 
John      Smith    01/01/1970  1000 

 

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
John Smith 01/01/1970 2000
Jane Smith 01/03/1975 3000
Jane Smith 06/08/1980 4000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;
uahmed97
Calcite | Level 5

@PeterClemmensen 

 

Forgive me if I misunderstand, but would this not ignore if two people happen to have the same date of birth?

 

For example, if the data was:

John Smith 01/01/1970 1000

Jane Doe 01/01/1970 2000

 

Your solution would keep only Jane Doe, instead of keeping both people?

PeterClemmensen
Tourmaline | Level 20

No 🙂

 

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;

 

Result:

 

FirstName LastName DateofBirth UniqueID 
Jane      Smith    01/01/1970  2000 
John      Smith    01/01/1970  1000 

 

uahmed97
Calcite | Level 5

Thank you very much, it does indeed work! I think I need to do a little more reading on FIRST and LAST then, as I don't quite understand it yet.

 

Thank you again 😄

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 946 views
  • 1 like
  • 2 in conversation