BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cyndia
Calcite | Level 5

what is the best way to eliminate the duplicate records when there is a dataset like this:

firstname     middle     lastname     ID     age     school               city

Debbie          R          Popular       12     13          xyz                    NY

Debbie          R          Popular       21     13          xyz                    NY

Debbie          R          Popular       12     13          xyz                    NY

Deb              R          Popular       12     13          xyz                    NY

note that only first and third records are duplicate, others are not the same.  we only want to eliminate either the first or the third record in this case.

there is a distinct( ) in SQL, in SAS, is there a way to eliminate only the records that are duplicate in all columns?  thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10


example:

data have;

 

input a b c;

  cards;

  1 2 3

  3 2 1

  1 2 3

  ;

  proc sql;

  create table want as

   select distinct * from have;

  quit;

  proc print ;run;

View solution in original post

4 REPLIES 4
Linlin
Lapis Lazuli | Level 10


example:

data have;

 

input a b c;

  cards;

  1 2 3

  3 2 1

  1 2 3

  ;

  proc sql;

  create table want as

   select distinct * from have;

  quit;

  proc print ;run;

Haikuo
Onyx | Level 15

Use nodupkey in proc sort:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

proc sort data=have out=want nodupkey;

by firstname middle lastname ID age school city;

run;

proc print;run;

07:05 Friday, April 20, 2012 67

  Obs firstname middle lastname ID age school city

  1 Deb R Popular 12 13 xyz NY

  2 Debbie R Popular 12 13 xyz NY

  3 Debbie R Popular 21 13 xyz NY

Haikuo

Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent.

Haikuo
Onyx | Level 15

Or just using data step:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

data _null_;

if 0 then set have;

  dcl hash h(dataset:'have');

  h.definekey(all:'y');

  h.definedata(all:'y');

  h.definedone();

  h.output(dataset: 'want');

run;

proc print;run;

07:05 Friday, April 20, 2012 68

  Obs firstname middle lastname ID age school city

  1 Debbie R Popular 12 13 xyz NY

  2 Deb R Popular 12 13 xyz NY

  3 Debbie R Popular 21 13 xyz NY

Haikuo

MikeZdeb
Rhodochrosite | Level 12

hi ...

re:  "Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent"

if you sort by all the variables, NODUP and NODUPKEY give the same result

also, if you want to sort by all the variables, you can use _ALL_  ...

proc sort data=have out=want nodup;

by _all_;

run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1182 views
  • 8 likes
  • 4 in conversation