Solved: Deleting duplicate rows

ilikesas · Posted 12-17-2014 08:19 PM

Hi,

suppose I have the following table:

ID	Name
1	Mike
1	Mike
2	George
3	Jack
3	Jack
4	Tan

Is it possible to delete the duplicate rows for IDs 1 and 3, but the rows where there are no duplicates, like those for IDs 2 and 4, to keep them as they are?

Thank you,

Haikuo · Posted 12-17-2014 09:15 PM

In this case, who not just:

proc sort data=dup nodupkey;

by id;

run;

View solution in original post

stat_sas · Posted 12-17-2014 08:34 PM

data want;

set have;

by id;

if first.id;

run;

ilikesas · Posted 12-17-2014 09:03 PM

Hi stat@sas,

I did the following including your code:

data dup;

input id name$;

datalines;

3 a

1 d

5 e

4 y

2 t

;

run;

data dup2;

set dup;

by id;

if first.id;

run;

But I get an error message:

ERROR 180-322: BY variables not properly sorted on dataset DUP

And the result that I get is:

	id	name
1	3	a

It seems that the duplicate was deleted for the first row, but after that the code stopped functioning

Thank you

stat_sas · Posted 12-17-2014 09:08 PM

You need to sort dataset dup by id before performing by processing

proc sort data=dup;

by id;

run;

then try this

data dup2;

set dup;

by id;

if first.id;

run;

Haikuo · Posted 12-17-2014 09:15 PM

In this case, who not just:

proc sort data=dup nodupkey;

by id;

run;

stat_sas · Posted 12-17-2014 09:22 PM

Thanks Haikuo - Yes, this is a better solution.

ilikesas · Posted 12-17-2014 09:42 PM

Thank you Hai.kuo and stat@sas !!!

Deleting duplicate rows

Re: Deleting duplicate rows

Re: Deleting duplicate rows

Re: Deleting duplicate rows

Re: Deleting duplicate rows

Re: Deleting duplicate rows

Re: Deleting duplicate rows

Re: Deleting duplicate rows