BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
katblack
Fluorite | Level 6

I know there have been a lot of posts about ways to delete duplicates, but I can't find exactly what I am looking for. 

 

I have a data set of phone numbers and I am trying to save all of the unique phone numbers for each ID. I want one row per phone number. If a phone number appears multiple times for one record, I want to keep the record with the most recent date. 

 

Here's an example of test data:


ID    phone number      date
1     (555) 555-5555     5/6/2018
1     (555) 555-5555     6/8/2018
1     (666) 666-6666     9/10/2018
2     (777) 777-7777     4/1/2018
2     (777) 777-7777     5/20/2018
3     (888) 888-8888     6/4/2018
4     (222) 222-2222     5/6/2018
4     (222) 222-2222     6/5/2018
4     (333) 333-3333     8/4/2018
4     (222) 222-2222     10/10/2018

 

I would like the resulting data set to look like this:

 

ID     phone number      date
1      (555) 555-5555     6/8/2018
1      (666) 666-6666     9/10/2018
2      (777) 777-7777     5/20/2018
3      (888) 888-8888     6/4/2018
4      (333) 333-3333     8/4/2018
4      (222) 222-2222     10/10/2018

 

Where within one ID, each phone number only appears once. If the phone number appears for different dates in the original data, then the most recent date is the record that is kept. 

 

Thanks in advance for any help you can provide! 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

This approach requires that you have real dates, and not just character strings:

 

proc sort data=have;
  by id phone_number date;
run;

data want;
   set have;
   by id phone_number date;
   if last.phone_number;
run;

View solution in original post

2 REPLIES 2
Astounding
PROC Star

This approach requires that you have real dates, and not just character strings:

 

proc sort data=have;
  by id phone_number date;
run;

data want;
   set have;
   by id phone_number date;
   if last.phone_number;
run;
ed_sas_member
Meteorite | Level 14

Hi @katblack 

 

Another approach could be:

 

proc sql;
	create table want as
	select ID, phone_number, max(date) as date format=MMDDYY10.
	from have
	group by ID, phone_number;
quit;

Capture d’écran 2020-01-04 à 14.30.18.png

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 690 views
  • 0 likes
  • 3 in conversation