I am fairly new to using SAS and require some assistance. My data set (Test) contains many duplicates and i need to remove the duplicates by a date field. Both the Nodup and Nodupkey functions do not provide me the results i need. The 'test' data set contains a list of accounts, the acctnum is the primary identifier, and multiple records for these accounts are coming back, i only want to keep the record with the most recent date.
Can someone please help?
Explore using two sorts, the first to get the desired "first condition" ordered at the beginning (ahead of any duplicates - using DESCENDING in the BY list) followed by a less-discreted SORT with EQUALS specified in the PROC SORT command.
Or another option is to use PROC SORT to get your data in the proper order (with the appropriate BY statement variables and, again, using DESCENDING in the BY list.
The use a DATA step approach with a BY statement and a list of the sort-variables listed that you want to test using the IF FIRST. (or maybe IF LAST.) -- choice of whether to use FIRST. or LAST. will depend on how you decide to sort your input file (with or without DESCENDING).
The SAS support http://support.sas.com/ website has SAS-hosted documentation and supplemental technical and conference topic-related reference materials. Here are a few Google advanced arguments for you to use to find suitable matches on this topic for discussion / example code:
remove duplicates equals site:sas.com
by first last processing site:sas.com
Also, this topic has been discussed on the SAS Discussion Forums, if you want to search the archives.