Solved: Delete duplicates based on 3 variables with different priority

katblack · Posted 01-03-2020 03:15 PM

I know there have been a lot of posts about ways to delete duplicates, but I can't find exactly what I am looking for.

I have a data set of phone numbers and I am trying to save all of the unique phone numbers for each ID. I want one row per phone number. If a phone number appears multiple times for one record, I want to keep the record with the most recent date.

Here's an example of test data:

ID phone number date
1 (555) 555-5555 5/6/2018
1 (555) 555-5555 6/8/2018
1 (666) 666-6666 9/10/2018
2 (777) 777-7777 4/1/2018
2 (777) 777-7777 5/20/2018
3 (888) 888-8888 6/4/2018
4 (222) 222-2222 5/6/2018
4 (222) 222-2222 6/5/2018
4 (333) 333-3333 8/4/2018
4 (222) 222-2222 10/10/2018

I would like the resulting data set to look like this:

ID phone number date
1 (555) 555-5555 6/8/2018
1 (666) 666-6666 9/10/2018
2 (777) 777-7777 5/20/2018
3 (888) 888-8888 6/4/2018
4 (333) 333-3333 8/4/2018
4 (222) 222-2222 10/10/2018

Where within one ID, each phone number only appears once. If the phone number appears for different dates in the original data, then the most recent date is the record that is kept.

Thanks in advance for any help you can provide!

Astounding · Posted 01-03-2020 04:55 PM

This approach requires that you have real dates, and not just character strings:

proc sort data=have;
  by id phone_number date;
run;

data want;
   set have;
   by id phone_number date;
   if last.phone_number;
run;

View solution in original post

Astounding · Posted 01-03-2020 04:55 PM

This approach requires that you have real dates, and not just character strings:

proc sort data=have;
  by id phone_number date;
run;

data want;
   set have;
   by id phone_number date;
   if last.phone_number;
run;

ed_sas_member · Posted 01-04-2020 08:31 AM

Hi @katblack

Another approach could be:

proc sql;
	create table want as
	select ID, phone_number, max(date) as date format=MMDDYY10.
	from have
	group by ID, phone_number;
quit;

Delete duplicates based on 3 variables with different priority

Re: Delete duplicates based on 3 variables with different priority

Re: Delete duplicates based on 3 variables with different priority

Re: Delete duplicates based on 3 variables with different priority

Delete duplicates based on 3 variables with different priority

Re: Delete duplicates based on 3 variables with different priority

Re: Delete duplicates based on 3 variables with different priority

Re: Delete duplicates based on 3 variables with different priority

Registration is open

SAS Training: Just a Click Away