BookmarkSubscribeRSS Feed
acros
Calcite | Level 5

This is an example of my dataset:

Patient                  Date1                   Date2                Difference    

Anna Smith       14MAY2013          22JUL2013                  69

Anna Smith       01MAY2013          22JUL2013                  82

John Brown        05JUN2013          12JUL2013                  37

John Brown        06MAY2013         12JUL2013                  67

Susan Garcia      06MAY2013        06SEP2013               123    

7 REPLIES 7
Reeza
Super User

So duplicate is identified by Patient? How do you know which one you want to keep?

Proc sort with a nodupkey is an option.

A data step (assuming sorted data) with first/last are another option. 

Logic is required though.

acros
Calcite | Level 5

Sorry, hit enter too soon. - I want to keep records with the bigger 'Difference' / earlier Date1.

Reeza
Super User

You can use the double sort method below. There's a link below that explains how it works in more details.

proc sort data=have;

by patient descending difference;

run;

proc sort data=have out=want nodupkey;

by patient;

run;

Proc sort nodup

acros
Calcite | Level 5

Ah. That's simple. I don't know know why I thought it needed to be more complicated that than. Thank you for your help!

Patrick
Opal | Level 21

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

- It relies on implicit knowledge of the sort algorithm Proc Sort uses

- I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

Reeza
Super User

Patrick wrote:

Fareeza Khurshed

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

1 It relies on implicit knowledge of the sort algorithm Proc Sort uses

2 I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

I disagree with 1, you don't need to know the sort algorithm of Proc Sort, just the concept of sorting data.

I'm not sure what you mean by 2 sort on a database.

acros
Calcite | Level 5

Sorry, I didn't finish typing my question:


I want to get rid of duplicates, but specifically keep the records with the bigger 'Difference' / earlier Date1 (those records that are in bold above). How can I do this?

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2397 views
  • 0 likes
  • 3 in conversation