Flagging up duplicates

Accepted Solution Solved
Reply
Contributor
Posts: 36
Accepted Solution

Flagging up duplicates

Hi I’ve got a table like this

ID       fruit                     colour

555     avocado             orange

111    orange                 orange

111    cabbage             green

123     mango                 green

333    strawberry             red

333    strawberry             red

555      berry                     orange

I want sas to look at all of the IDS above and flag if similar ID exist, like this:

ID    fruit                 colour               duplicate?

555  avocado         Orange            No

111  orange            Orange             No

111  cabbage         green             Yes

123   mango            green               No

333  strawberry       red                   No

333  strawberry       red                   Yes

555  berry                 orange              Yes

  Thanks.


Accepted Solutions
Solution
‎09-04-2014 02:39 PM
Regular Contributor
Posts: 233

Re: Flagging up duplicates

DATA HAVE;
INPUT ID       fruit    $                 colour $;
DATALINES;
555     avocado               orange
111    orange                 orange
111    cabbage                green
123     mango                 green
333    strawberry             red
333    strawberry             red
555      berry                orange
;
RUN;

PROC SORT DATA=HAVE;
BY ID;
RUN;

DATA WANT;
SET HAVE;
LENGTH DUPLICATE $3.;
BY ID;
IF not(FIRST.ID) THEN
  DUPLICATE='yes';
  ELSE DUPLICATE='NO';
RUN;

Capture.JPG

View solution in original post


All Replies
Trusted Advisor
Posts: 1,228

Re: Flagging up duplicates

proc sort data=have;
by id;
run;

data want;
set have;
length duplicate $5.;
by id;
duplicate='No';
if last.id then duplicate='Yes';
if first.id and last.id then duplicate='No';
run;

Respected Advisor
Posts: 3,156

Re: Flagging up duplicates

if you want to maintain the original data order:

data have;

     input ID       (fruit                     colour ) (:$10.);

     cards;

555     avocado             orange

111    orange                 orange

111    cabbage             green

123     mango                 green

333    strawberry             red

333    strawberry             red

555      berry                     orange

;

data want;

     if _n_=1 then

           do;

                dcl hash h();

                h.definekey('id');

                h.definedone();

           end;

     set have;

     length dup $3;

     if h.check()=0 then

           dup='Yes';

     else

           do;

                rc=h.add();

                dup='No';

           end;

     drop rc;

run;

Haikuo

Solution
‎09-04-2014 02:39 PM
Regular Contributor
Posts: 233

Re: Flagging up duplicates

DATA HAVE;
INPUT ID       fruit    $                 colour $;
DATALINES;
555     avocado               orange
111    orange                 orange
111    cabbage                green
123     mango                 green
333    strawberry             red
333    strawberry             red
555      berry                orange
;
RUN;

PROC SORT DATA=HAVE;
BY ID;
RUN;

DATA WANT;
SET HAVE;
LENGTH DUPLICATE $3.;
BY ID;
IF not(FIRST.ID) THEN
  DUPLICATE='yes';
  ELSE DUPLICATE='NO';
RUN;

Capture.JPG

Super User
Super User
Posts: 6,842

Re: Flagging up duplicates

You as basically asking for NOT FIRST.ID.

data have ;

  input id fruit :$10. color :$10. ;

cards;

555 avocado orange

111 orange orange

111 cabbage green

123 mango green

333 strawberry red

333 strawberry red

555 berry orange

run;

proc sort; by id; run;

data want ;

  set have ;

  by id ;

  if first.id then dup='NO ';

  else dup='YES';

  put (_all_) (Smiley Happy;

run;

111 orange orange NO

111 cabbage green YES

123 mango green NO

333 strawberry red NO

333 strawberry red YES

555 avocado orange NO

555 berry orange YES


Occasional Contributor
Posts: 8

Re: Flagging up duplicates

Another way.

proc sort data=HAVE out=WANT1 dupout=WANT2 nodupkey;

  by ID;

run;

data WANT;

  set WANT1 WANT2 (in=_IN);

  by ID;

  if _IN then DUP = "Yes";

  else DUP = "No";

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 269 views
  • 0 likes
  • 6 in conversation