Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

elimination of duplicate rows

Accepted Solution Solved
Reply
Contributor
Posts: 37
Accepted Solution

elimination of duplicate rows

what is the best way to eliminate the duplicate records when there is a dataset like this:

firstname     middle     lastname     ID     age     school               city

Debbie          R          Popular       12     13          xyz                    NY

Debbie          R          Popular       21     13          xyz                    NY

Debbie          R          Popular       12     13          xyz                    NY

Deb              R          Popular       12     13          xyz                    NY

note that only first and third records are duplicate, others are not the same.  we only want to eliminate either the first or the third record in this case.

there is a distinct( ) in SQL, in SAS, is there a way to eliminate only the records that are duplicate in all columns?  thanks.


Accepted Solutions
Solution
‎04-20-2012 01:28 PM
Super Contributor
Posts: 1,636

Re: elimination of duplicate rows


example:

data have;

 

input a b c;

  cards;

  1 2 3

  3 2 1

  1 2 3

  ;

  proc sql;

  create table want as

   select distinct * from have;

  quit;

  proc print ;run;

View solution in original post


All Replies
Solution
‎04-20-2012 01:28 PM
Super Contributor
Posts: 1,636

Re: elimination of duplicate rows


example:

data have;

 

input a b c;

  cards;

  1 2 3

  3 2 1

  1 2 3

  ;

  proc sql;

  create table want as

   select distinct * from have;

  quit;

  proc print ;run;

Respected Advisor
Posts: 3,156

Re: elimination of duplicate rows

Use nodupkey in proc sort:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

proc sort data=have out=want nodupkey;

by firstname middle lastname ID age school city;

run;

proc print;run;

07:05 Friday, April 20, 2012 67

  Obs firstname middle lastname ID age school city

  1 Deb R Popular 12 13 xyz NY

  2 Debbie R Popular 12 13 xyz NY

  3 Debbie R Popular 21 13 xyz NY

Haikuo

Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent.

Respected Advisor
Posts: 3,156

Re: elimination of duplicate rows

Or just using data step:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

data _null_;

if 0 then set have;

  dcl hash h(dataset:'have');

  h.definekey(all:'y');

  h.definedata(all:'y');

  h.definedone();

  h.output(dataset: 'want');

run;

proc print;run;

07:05 Friday, April 20, 2012 68

  Obs firstname middle lastname ID age school city

  1 Debbie R Popular 12 13 xyz NY

  2 Deb R Popular 12 13 xyz NY

  3 Debbie R Popular 21 13 xyz NY

Haikuo

Valued Guide
Posts: 765

Re: elimination of duplicate rows

hi ...

re:  "Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent"

if you sort by all the variables, NODUP and NODUPKEY give the same result

also, if you want to sort by all the variables, you can use _ALL_  ...

proc sort data=have out=want nodup;

by _all_;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 449 views
  • 8 likes
  • 4 in conversation