BookmarkSubscribeRSS Feed
acros
Calcite | Level 5

This is an example of my dataset:

Patient                  Date1                   Date2                Difference    

Anna Smith       14MAY2013          22JUL2013                  69

Anna Smith       01MAY2013          22JUL2013                  82

John Brown        05JUN2013          12JUL2013                  37

John Brown        06MAY2013         12JUL2013                  67

Susan Garcia      06MAY2013        06SEP2013               123    

7 REPLIES 7
Reeza
Super User

So duplicate is identified by Patient? How do you know which one you want to keep?

Proc sort with a nodupkey is an option.

A data step (assuming sorted data) with first/last are another option. 

Logic is required though.

acros
Calcite | Level 5

Sorry, hit enter too soon. - I want to keep records with the bigger 'Difference' / earlier Date1.

Reeza
Super User

You can use the double sort method below. There's a link below that explains how it works in more details.

proc sort data=have;

by patient descending difference;

run;

proc sort data=have out=want nodupkey;

by patient;

run;

Proc sort nodup

acros
Calcite | Level 5

Ah. That's simple. I don't know know why I thought it needed to be more complicated that than. Thank you for your help!

Patrick
Opal | Level 21

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

- It relies on implicit knowledge of the sort algorithm Proc Sort uses

- I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

Reeza
Super User

Patrick wrote:

Fareeza Khurshed

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

1 It relies on implicit knowledge of the sort algorithm Proc Sort uses

2 I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

I disagree with 1, you don't need to know the sort algorithm of Proc Sort, just the concept of sorting data.

I'm not sure what you mean by 2 sort on a database.

acros
Calcite | Level 5

Sorry, I didn't finish typing my question:


I want to get rid of duplicates, but specifically keep the records with the bigger 'Difference' / earlier Date1 (those records that are in bold above). How can I do this?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1050 views
  • 0 likes
  • 3 in conversation