Re: Identifying duplicates from two or more sets of data

mitch · Posted 09-17-2008 10:56 AM

Hi. I'm trying to compare two sets of data. The common field that i'm using for comparison is the ID field. I'd like to be able to identify which ID's are duplicates. I think i could possibly use proc sort and nodupkey... but that would delete the observations instead of identifying them.
i've used proc compare but it only seems to compare the variables not the observations... any ideas? Here's my compare code:

proc compare base = work.A compare = work.B;
id IDCODE;
run;

sbb · Posted 09-17-2008 12:17 PM

PROC SORT has a DUPOUT= parameter so you can re-direct the duplicates to a different file. The other option, depending on your needs, is to use a DATA step, with a BY statement, and use the IF statement test for FIRST.ID and LAST.ID in order to perform some desired processing logic.

Scott Barry
SBBWorks, Inc.

mitch · Posted 09-17-2008 05:42 PM

Thanks a lot! I ended up breaking into PROC SQL and joining the two datasets then using ODS to outsheet the dups. I'm playing with PROC SORT Dupout so I can know how to use it for future.

I appreciate your suggestions.

Identifying duplicates from two or more sets of data

Re: Identifying duplicates from two or more sets of data

Re: Identifying duplicates from two or more sets of data

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away