is there a way to identify duplicate ids in a file with 40,000 records?
Any suggestion would be greatly appreciated.
Task>Data>Sort> Under options, look for the first and the duplicate options.
Or look at proc sort.
Thanks a lot. I have few methods to try now,
data step using if not last.id then output dups;
else output unique; worked for me
it should be if not( last.id and first.id) to get both of the observations that are duplicate unless you want only the last one.
Interesting. I wanted both records. Now I got it . Thanks for the correction. Never knew about this code (if not last.id and first.id)
This would print all the records for IDs with multiple entries:
group by ID
having sum(*) GT 1;
40,000 does not sound too much for Proc SQL. The following code may also help you reach your goal;
select * from yourdata group by ID having count(*)>1; quit;