topic Re: Identifying duplicates from two or more sets of data in SAS Programming

Identifying duplicates from two or more sets of data

mitch — Wed, 17 Sep 2008 14:56:05 GMT

Hi. I'm trying to compare two sets of data. The common field that i'm using for comparison is the ID field. I'd like to be able to identify which ID's are duplicates. I think i could possibly use proc sort and nodupkey... but that would delete the observations instead of identifying them.
i've used proc compare but it only seems to compare the variables not the observations... any ideas? Here's my compare code:

proc compare base = work.A compare = work.B;
id IDCODE;
run;

Re: Identifying duplicates from two or more sets of data

sbb — Wed, 17 Sep 2008 16:17:30 GMT

PROC SORT has a DUPOUT= parameter so you can re-direct the duplicates to a different file. The other option, depending on your needs, is to use a DATA step, with a BY statement, and use the IF statement test for FIRST.ID and LAST.ID in order to perform some desired processing logic.

Scott Barry
SBBWorks, Inc.

Re: Identifying duplicates from two or more sets of data

mitch — Wed, 17 Sep 2008 21:42:35 GMT

Thanks a lot! I ended up breaking into PROC SQL and joining the two datasets then using ODS to outsheet the dups. I'm playing with PROC SORT Dupout so I can know how to use it for future.

I appreciate your suggestions.