I know there have been a lot of posts about ways to delete duplicates, but I can't find exactly what I am looking for.
I have a data set of phone numbers and I am trying to save all of the unique phone numbers for each ID. I want one row per phone number. If a phone number appears multiple times for one record, I want to keep the record with the most recent date.
Here's an example of test data:
ID phone number date
1 (555) 555-5555 5/6/2018
1 (555) 555-5555 6/8/2018
1 (666) 666-6666 9/10/2018
2 (777) 777-7777 4/1/2018
2 (777) 777-7777 5/20/2018
3 (888) 888-8888 6/4/2018
4 (222) 222-2222 5/6/2018
4 (222) 222-2222 6/5/2018
4 (333) 333-3333 8/4/2018
4 (222) 222-2222 10/10/2018
I would like the resulting data set to look like this:
ID phone number date
1 (555) 555-5555 6/8/2018
1 (666) 666-6666 9/10/2018
2 (777) 777-7777 5/20/2018
3 (888) 888-8888 6/4/2018
4 (333) 333-3333 8/4/2018
4 (222) 222-2222 10/10/2018
Where within one ID, each phone number only appears once. If the phone number appears for different dates in the original data, then the most recent date is the record that is kept.
Thanks in advance for any help you can provide!
This approach requires that you have real dates, and not just character strings:
proc sort data=have;
by id phone_number date;
run;
data want;
set have;
by id phone_number date;
if last.phone_number;
run;
This approach requires that you have real dates, and not just character strings:
proc sort data=have;
by id phone_number date;
run;
data want;
set have;
by id phone_number date;
if last.phone_number;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.