DATA Step, Macro, Functions and more

Identifying duplicates from two or more sets of data

Reply
New Contributor
Posts: 2

Identifying duplicates from two or more sets of data

Hi. I'm trying to compare two sets of data. The common field that i'm using for comparison is the ID field. I'd like to be able to identify which ID's are duplicates. I think i could possibly use proc sort and nodupkey... but that would delete the observations instead of identifying them.
i've used proc compare but it only seems to compare the variables not the observations... any ideas? Here's my compare code:

proc compare base = work.A compare = work.B;
id IDCODE;
run;
Super Contributor
Super Contributor
Posts: 3,174

Re: Identifying duplicates from two or more sets of data

PROC SORT has a DUPOUT= parameter so you can re-direct the duplicates to a different file. The other option, depending on your needs, is to use a DATA step, with a BY statement, and use the IF statement test for FIRST.ID and LAST.ID in order to perform some desired processing logic.

Scott Barry
SBBWorks, Inc.
New Contributor
Posts: 2

Re: Identifying duplicates from two or more sets of data

Thanks a lot! I ended up breaking into PROC SQL and joining the two datasets then using ODS to outsheet the dups. I'm playing with PROC SORT Dupout so I can know how to use it for future.

I appreciate your suggestions.
Ask a Question
Discussion stats
  • 2 replies
  • 119 views
  • 0 likes
  • 2 in conversation