Help using Base SAS procedures

Keeping one of duplicate rows

Reply
Contributor
Posts: 24

Keeping one of duplicate rows

This is an example of my dataset:

Patient                  Date1                   Date2                Difference    

Anna Smith       14MAY2013          22JUL2013                  69

Anna Smith       01MAY2013          22JUL2013                  82

John Brown        05JUN2013          12JUL2013                  37

John Brown        06MAY2013         12JUL2013                  67

Susan Garcia      06MAY2013        06SEP2013               123    

Super User
Posts: 17,744

Re: Keeping one of duplicate rows

So duplicate is identified by Patient? How do you know which one you want to keep?

Proc sort with a nodupkey is an option.

A data step (assuming sorted data) with first/last are another option. 

Logic is required though.

Contributor
Posts: 24

Re: Keeping one of duplicate rows

Sorry, hit enter too soon. - I want to keep records with the bigger 'Difference' / earlier Date1.

Super User
Posts: 17,744

Re: Keeping one of duplicate rows

You can use the double sort method below. There's a link below that explains how it works in more details.

proc sort data=have;

by patient descending difference;

run;

proc sort data=have out=want nodupkey;

by patient;

run;

Proc sort nodup

Contributor
Posts: 24

Re: Keeping one of duplicate rows

Ah. That's simple. I don't know know why I thought it needed to be more complicated that than. Thank you for your help!

Respected Advisor
Posts: 3,886

Re: Keeping one of duplicate rows

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

- It relies on implicit knowledge of the sort algorithm Proc Sort uses

- I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

Super User
Posts: 17,744

Re: Keeping one of duplicate rows

Patrick wrote:

Fareeza Khurshed

Even though your double sort approach works and I've seen this recently even done in a production implementation I personally have strong reservations of using this, because:

1 It relies on implicit knowledge of the sort algorithm Proc Sort uses

2 I believe this approach will stop working when Proc Sort is pushed to a data base

Base SAS(R) 9.3 Procedures Guide, Second Edition

I personally prefer to either use a Proc Sort together with a Data step and first. or last. or to use a Proc SQL with a min() function and Group By / Having clause.

I disagree with 1, you don't need to know the sort algorithm of Proc Sort, just the concept of sorting data.

I'm not sure what you mean by 2 sort on a database.

Contributor
Posts: 24

Re: Keeping one of duplicate rows

Sorry, I didn't finish typing my question:


I want to get rid of duplicates, but specifically keep the records with the bigger 'Difference' / earlier Date1 (those records that are in bold above). How can I do this?

Ask a Question
Discussion stats
  • 7 replies
  • 210 views
  • 0 likes
  • 3 in conversation