This is an example of my dataset:
Patient Date1 Date2 Difference
Anna Smith 14MAY2013 22JUL2013 69
Anna Smith 01MAY2013 22JUL2013 82
John Brown 05JUN2013 12JUL2013 37
John Brown 06MAY2013 12JUL2013 67
Susan Garcia 06MAY2013 06SEP2013 123
So duplicate is identified by Patient? How do you know which one you want to keep?
Proc sort with a nodupkey is an option.
A data step (assuming sorted data) with first/last are another option.
Logic is required though.
Sorry, hit enter too soon. - I want to keep records with the bigger 'Difference' / earlier Date1.
You can use the double sort method below. There's a link below that explains how it works in more details.
proc sort data=have;
by patient descending difference;
run;
proc sort data=have out=want nodupkey;
by patient;
run;
Ah. That's simple. I don't know know why I thought it needed to be more complicated that than. Thank you for your help!
Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:
- It relies on implicit knowledge of the sort algorithm Proc Sort uses
- I believe this approach will stop working when Proc Sort is pushed to a data base
Base SAS(R) 9.3 Procedures Guide, Second Edition
I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.
Patrick wrote:
Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:
1 It relies on implicit knowledge of the sort algorithm Proc Sort uses
2 I believe this approach will stop working when Proc Sort is pushed to a data base
Base SAS(R) 9.3 Procedures Guide, Second Edition
I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.
I disagree with 1, you don't need to know the sort algorithm of Proc Sort, just the concept of sorting data.
I'm not sure what you mean by 2 sort on a database.
Sorry, I didn't finish typing my question:
I want to get rid of duplicates, but specifically keep the records with the bigger 'Difference' / earlier Date1 (those records that are in bold above). How can I do this?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.