Solved
Contributor
Posts: 36

# Flagging up duplicates

Hi I’ve got a table like this

ID       fruit                     colour

111    orange                 orange

111    cabbage             green

123     mango                 green

333    strawberry             red

333    strawberry             red

555      berry                     orange

I want sas to look at all of the IDS above and flag if similar ID exist, like this:

ID    fruit                 colour               duplicate?

111  orange            Orange             No

111  cabbage         green             Yes

123   mango            green               No

333  strawberry       red                   No

333  strawberry       red                   Yes

555  berry                 orange              Yes

Thanks.

Accepted Solutions
Solution
‎09-04-2014 02:39 PM
Regular Contributor
Posts: 233

## Re: Flagging up duplicates

DATA HAVE;
INPUT ID       fruit    \$                 colour \$;
DATALINES;
111    orange                 orange
111    cabbage                green
123     mango                 green
333    strawberry             red
333    strawberry             red
555      berry                orange
;
RUN;

PROC SORT DATA=HAVE;
BY ID;
RUN;

DATA WANT;
SET HAVE;
LENGTH DUPLICATE \$3.;
BY ID;
IF not(FIRST.ID) THEN
DUPLICATE='yes';
ELSE DUPLICATE='NO';
RUN;

All Replies
Posts: 1,270

## Re: Flagging up duplicates

proc sort data=have;
by id;
run;

data want;
set have;
length duplicate \$5.;
by id;
duplicate='No';
if last.id then duplicate='Yes';
if first.id and last.id then duplicate='No';
run;

Posts: 3,167

## Re: Flagging up duplicates

if you want to maintain the original data order:

data have;

input ID       (fruit                     colour ) (:\$10.);

cards;

111    orange                 orange

111    cabbage             green

123     mango                 green

333    strawberry             red

333    strawberry             red

555      berry                     orange

;

data want;

if _n_=1 then

do;

dcl hash h();

h.definekey('id');

h.definedone();

end;

set have;

length dup \$3;

if h.check()=0 then

dup='Yes';

else

do;

dup='No';

end;

drop rc;

run;

Haikuo

Solution
‎09-04-2014 02:39 PM
Regular Contributor
Posts: 233

## Re: Flagging up duplicates

DATA HAVE;
INPUT ID       fruit    \$                 colour \$;
DATALINES;
111    orange                 orange
111    cabbage                green
123     mango                 green
333    strawberry             red
333    strawberry             red
555      berry                orange
;
RUN;

PROC SORT DATA=HAVE;
BY ID;
RUN;

DATA WANT;
SET HAVE;
LENGTH DUPLICATE \$3.;
BY ID;
IF not(FIRST.ID) THEN
DUPLICATE='yes';
ELSE DUPLICATE='NO';
RUN;

Super User
Posts: 8,127

## Re: Flagging up duplicates

You as basically asking for NOT FIRST.ID.

data have ;

input id fruit :\$10. color :\$10. ;

cards;

111 orange orange

111 cabbage green

123 mango green

333 strawberry red

333 strawberry red

555 berry orange

run;

proc sort; by id; run;

data want ;

set have ;

by id ;

if first.id then dup='NO ';

else dup='YES';

put (_all_) (;

run;

111 orange orange NO

111 cabbage green YES

123 mango green NO

333 strawberry red NO

333 strawberry red YES

555 berry orange YES

Occasional Contributor
Posts: 8

## Re: Flagging up duplicates

Another way.

proc sort data=HAVE out=WANT1 dupout=WANT2 nodupkey;

by ID;

run;

data WANT;

set WANT1 WANT2 (in=_IN);

by ID;

if _IN then DUP = "Yes";

else DUP = "No";

run;

🔒 This topic is solved and locked.