BookmarkSubscribeRSS Feed
acros
Calcite | Level 5

This is an example of my dataset:

Patient                  Date1                   Date2                Difference    

Anna Smith       14MAY2013          22JUL2013                  69

Anna Smith       01MAY2013          22JUL2013                  82

John Brown        05JUN2013          12JUL2013                  37

John Brown        06MAY2013         12JUL2013                  67

Susan Garcia      06MAY2013        06SEP2013               123    

7 REPLIES 7
Reeza
Super User

So duplicate is identified by Patient? How do you know which one you want to keep?

Proc sort with a nodupkey is an option.

A data step (assuming sorted data) with first/last are another option. 

Logic is required though.

acros
Calcite | Level 5

Sorry, hit enter too soon. - I want to keep records with the bigger 'Difference' / earlier Date1.

Reeza
Super User

You can use the double sort method below. There's a link below that explains how it works in more details.

proc sort data=have;

by patient descending difference;

run;

proc sort data=have out=want nodupkey;

by patient;

run;

Proc sort nodup

acros
Calcite | Level 5

Ah. That's simple. I don't know know why I thought it needed to be more complicated that than. Thank you for your help!

Patrick
Opal | Level 21

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

- It relies on implicit knowledge of the sort algorithm Proc Sort uses

- I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

Reeza
Super User

Patrick wrote:

Fareeza Khurshed

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

1 It relies on implicit knowledge of the sort algorithm Proc Sort uses

2 I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

I disagree with 1, you don't need to know the sort algorithm of Proc Sort, just the concept of sorting data.

I'm not sure what you mean by 2 sort on a database.

acros
Calcite | Level 5

Sorry, I didn't finish typing my question:


I want to get rid of duplicates, but specifically keep the records with the bigger 'Difference' / earlier Date1 (those records that are in bold above). How can I do this?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1555 views
  • 0 likes
  • 3 in conversation