Solved: Sorting a table, de-duping three variables and keeping the first occur...

uahmed97 · Posted 02-08-2021 04:15 AM

This is my first post, so apologies if this question is not worded/formatted correctly.

I am trying to sort a dataset that has the following rough structure:

First Name	Last Name	Date of Birth	Unique ID
John	Smith	01/01/1970	1000
John	Smith	01/01/1970	2000
Jane	Smith	01/03/1975	3000
Jane	Smith	06/08/1980	4000

The rough operation I want to carry out is to de-duplicate the first name, last name and dob (with all three representing a single person), and then keep the biggest number for unique ID.

What I want to achieve is this:

First Name	Last Name	Date of Birth	Unique ID
John	Smith	01/01/1970	2000
Jane	Smith	01/03/1975	3000
Jane	Smith	06/08/1980	4000

My initial thought was to try something like this:

DATA people_sorted;
    SET people;
PROC SORT DATA=people_sorted;
    BY id;
RUN;
PROC SORT DATA=people_sorted NODUPKEY;
    BY fname lname dob;
RUN;

This does not retain the original sorting as I had initially thought.

I was wondering if there was a way to 'chain' these sorts together, so to speak. Alternatively, is there a better approach to this that I'm not seeing?

Thank you in advance for your help!

PeterClemmensen · Posted 02-08-2021 04:48 AM

No 🙂

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;

Result:

FirstName LastName DateofBirth UniqueID 
Jane      Smith    01/01/1970  2000 
John      Smith    01/01/1970  1000

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 02-08-2021 04:26 AM

Try this

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
John Smith 01/01/1970 2000
Jane Smith 01/03/1975 3000
Jane Smith 06/08/1980 4000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;

The DATA to DATA Step Macro
Blog: SASnrd

uahmed97 · Posted 02-08-2021 04:37 AM

@PeterClemmensen

Forgive me if I misunderstand, but would this not ignore if two people happen to have the same date of birth?

For example, if the data was:

John Smith 01/01/1970 1000

Jane Doe 01/01/1970 2000

Your solution would keep only Jane Doe, instead of keeping both people?

PeterClemmensen · Posted 02-08-2021 04:48 AM

No 🙂

data have;
input FirstName $ LastName $ DateofBirth :ddmmyy10. UniqueID;
format DateofBirth :ddmmyy10.;
datalines;
John Smith 01/01/1970 1000
Jane Smith 01/01/1970 2000
;

proc sort data = have;
   by FirstName LastName DateofBirth UniqueID;
run;

data want;
   set have;
   by FirstName LastName DateofBirth UniqueID;
   if last.DateofBirth;
run;

Result:

FirstName LastName DateofBirth UniqueID 
Jane      Smith    01/01/1970  2000 
John      Smith    01/01/1970  1000

The DATA to DATA Step Macro
Blog: SASnrd

uahmed97 · Posted 02-08-2021 05:01 AM

Thank you very much, it does indeed work! I think I need to do a little more reading on FIRST and LAST then, as I don't quite understand it yet.

Thank you again 😄

Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Catch up on SAS Innovate 2026

Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Re: Sorting a table, de-duping three variables and keeping the first occurrence of a fourth

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away