# Flagging up duplicates

Hi I’ve got a table like this

ID       fruit                     colour

111    orange                 orange

111    cabbage             green

123     mango                 green

333    strawberry             red

333    strawberry             red

555      berry                     orange

I want sas to look at all of the IDS above and flag if similar ID exist, like this:

ID    fruit                 colour               duplicate?

111  orange            Orange             No

111  cabbage         green             Yes

123   mango            green               No

333  strawberry       red                   No

333  strawberry       red                   Yes

555  berry                 orange              Yes

Thanks.

Solution
‎09-04-2014 02:39 PM
## Re: Flagging up duplicates

DATA HAVE;
INPUT ID       fruit    \$                 colour \$;
DATALINES;
111    orange                 orange
111    cabbage                green
123     mango                 green
333    strawberry             red
333    strawberry             red
555      berry                orange
;
RUN;

PROC SORT DATA=HAVE;
BY ID;
RUN;

DATA WANT;
SET HAVE;
LENGTH DUPLICATE \$3.;
BY ID;
IF not(FIRST.ID) THEN
DUPLICATE='yes';
ELSE DUPLICATE='NO';
RUN;

## Re: Flagging up duplicates

proc sort data=have;
by id;
run;

data want;
set have;
length duplicate \$5.;
by id;
duplicate='No';
if last.id then duplicate='Yes';
if first.id and last.id then duplicate='No';
run;

## Re: Flagging up duplicates

if you want to maintain the original data order:

data have;

input ID       (fruit                     colour ) (:\$10.);

cards;

111    orange                 orange

111    cabbage             green

123     mango                 green

333    strawberry             red

333    strawberry             red

555      berry                     orange

;

data want;

if _n_=1 then

do;

dcl hash h();

h.definekey('id');

h.definedone();

end;

set have;

length dup \$3;

if h.check()=0 then

dup='Yes';

else

do;

dup='No';

end;

drop rc;

run;

Haikuo

‎09-04-2014 02:39 PM
## Re: Flagging up duplicates

## Re: Flagging up duplicates

You as basically asking for NOT FIRST.ID.

data have ;

input id fruit :\$10. color :\$10. ;

cards;

111 orange orange

111 cabbage green

123 mango green

333 strawberry red

333 strawberry red

555 berry orange

run;

proc sort; by id; run;

data want ;

set have ;

by id ;

if first.id then dup='NO ';

else dup='YES';

put (_all_) (;

run;

111 orange orange NO

111 cabbage green YES

123 mango green NO

333 strawberry red NO

333 strawberry red YES

555 berry orange YES

## Re: Flagging up duplicates

Another way.

proc sort data=HAVE out=WANT1 dupout=WANT2 nodupkey;

by ID;

run;

data WANT;

set WANT1 WANT2 (in=_IN);

by ID;

if _IN then DUP = "Yes";

else DUP = "No";

run;

