Solved: Re: elimination of duplicate rows

Cyndia · Posted 04-20-2012 01:20 PM

what is the best way to eliminate the duplicate records when there is a dataset like this:

firstname middle lastname ID age school city

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

note that only first and third records are duplicate, others are not the same. we only want to eliminate either the first or the third record in this case.

there is a distinct( ) in SQL, in SAS, is there a way to eliminate only the records that are duplicate in all columns? thanks.

Linlin · Posted 04-20-2012 01:28 PM

example:

data have;

input a b c;

cards;

1 2 3

3 2 1

1 2 3

;

proc sql;

create table want as

select distinct * from have;

quit;

proc print ;run;

View solution in original post

Linlin · Posted 04-20-2012 01:28 PM

example:

data have;

input a b c;

cards;

1 2 3

3 2 1

1 2 3

;

proc sql;

create table want as

select distinct * from have;

quit;

proc print ;run;

Haikuo · Posted 04-20-2012 01:28 PM

Use nodupkey in proc sort:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

proc sort data=have out=want nodupkey;

by firstname middle lastname ID age school city;

run;

proc print;run;

07:05 Friday, April 20, 2012 67

Obs firstname middle lastname ID age school city

1 Deb R Popular 12 13 xyz NY

2 Debbie R Popular 12 13 xyz NY

3 Debbie R Popular 21 13 xyz NY

Haikuo

Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent.

Haikuo · Posted 04-20-2012 01:33 PM

Or just using data step:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

data _null_;

if 0 then set have;

dcl hash h(dataset:'have');

h.definekey(all:'y');

h.definedata(all:'y');

h.definedone();

h.output(dataset: 'want');

run;

proc print;run;

07:05 Friday, April 20, 2012 68

Obs firstname middle lastname ID age school city

1 Debbie R Popular 12 13 xyz NY

2 Deb R Popular 12 13 xyz NY

3 Debbie R Popular 21 13 xyz NY

Haikuo

MikeZdeb · Posted 04-20-2012 04:14 PM

hi ...

re: "Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent"

if you sort by all the variables, NODUP and NODUPKEY give the same result

also, if you want to sort by all the variables, you can use _ALL_ ...

proc sort data=have out=want nodup;

by _all_;

run;