Re: Repeat of BY values -- why?

joegee · Posted 11-13-2009 10:18 AM

Code:
PROC SORT DATA=ND NODUPS;
BY ITEM;
RUN;
PROC SORT DATA=DUPS;
BY ITEM;
RUN;
DATA COMBINE;
MERGE DUPS(IN=OK1) ND(IN=OK2);
BY ITEM;
IF OK1;

The 'ND' dataset should not have any duplicate rcds (log shows several were deleted). So why is log showing: "Note: MERGE statement has more than one data set with repeats of BY values." ???

Flip · Posted 11-13-2009 10:21 AM

NODUPS means the entire record is a duplicate NODUPKEY would rid you of duplicate BY values.

joegee · Posted 11-13-2009 10:25 AM

DUH - I should know that! Thanks you have turned this Fri 13 into my lucky day...

sbb · Posted 11-13-2009 10:26 AM

Also, consider that in some instances (your input file determined) you must have a sufficient BY variable list to ensure that duplicate observations are sorted to be adjacent, otherwise the duplicates will not be deleted, with NODUPS.

Scott Barry
SBBWorks, Inc.

Peter_C · Posted 11-13-2009 10:28 AM

check the lengths of the column/variable ITEM in each of your data sets

2541 - Multiple lengths were specified for the BY variable xxxx by input data sets
http://support.sas.com/kb/2/541.html
SUGI 28: Danger: MERGE Ahead! Warning: BY Variable with Multiple Lengths!
http://www2.sas.com/proceedings/sugi28/098-28.pdf

Repeat of BY values -- why?