Desktop productivity for business analysts and programmers

Identify duplicates in a file

Reply
Contributor
Posts: 66

Identify duplicates in a file


is there a way to identify duplicate ids in a file with 40,000 records?

Any suggestion would be greatly appreciated.

Thanks

Super User
Posts: 18,997

Re: Identify duplicates in a file

Task>Data>Sort> Under options, look for the first and the duplicate options.

Or look at proc sort.

Contributor
Posts: 66

Re: Identify duplicates in a file

Thanks a lot. I have few methods to try now,

Contributor
Posts: 66

Re: Identify duplicates in a file

data step using if not last.id then output dups;

else output unique; worked for me

Thanks

Super User
Posts: 18,997

Re: Identify duplicates in a file

it should be if not( last.id and first.id) to get both of the observations that are duplicate unless you want only the last one.

Contributor
Posts: 66

Re: Identify duplicates in a file

Interesting. I wanted both records. Now I got it . Thanks for the correction. Never knew about this code (if not last.id and first.id)

Frequent Contributor
Posts: 102

Re: Identify duplicates in a file

This would print all the records for IDs with multiple entries:

proc sql;

select *

from data

group by ID

having sum(*) GT 1;

quit;

Respected Advisor
Posts: 3,156

Re: Identify duplicates in a file

40,000 does not sound too much for Proc SQL. The following code may also help you reach your goal;

proc sql;

  select * from yourdata group by ID having count(*)>1; quit;

Haikuo

Ask a Question
Discussion stats
  • 7 replies
  • 698 views
  • 0 likes
  • 4 in conversation