is there a way to identify duplicate ids in a file with 40,000 records?
Any suggestion would be greatly appreciated.
Thanks
Task>Data>Sort> Under options, look for the first and the duplicate options.
Or look at proc sort.
Thanks a lot. I have few methods to try now,
data step using if not last.id then output dups;
else output unique; worked for me
Thanks
it should be if not( last.id and first.id) to get both of the observations that are duplicate unless you want only the last one.
Interesting. I wanted both records. Now I got it . Thanks for the correction. Never knew about this code (if not last.id and first.id)
This would print all the records for IDs with multiple entries:
proc sql;
select *
from data
group by ID
having sum(*) GT 1;
quit;
40,000 does not sound too much for Proc SQL. The following code may also help you reach your goal;
proc sql;
select * from yourdata group by ID having count(*)>1; quit;
Haikuo
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.