BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cyndia
Calcite | Level 5

what is the best way to eliminate the duplicate records when there is a dataset like this:

firstname     middle     lastname     ID     age     school               city

Debbie          R          Popular       12     13          xyz                    NY

Debbie          R          Popular       21     13          xyz                    NY

Debbie          R          Popular       12     13          xyz                    NY

Deb              R          Popular       12     13          xyz                    NY

note that only first and third records are duplicate, others are not the same.  we only want to eliminate either the first or the third record in this case.

there is a distinct( ) in SQL, in SAS, is there a way to eliminate only the records that are duplicate in all columns?  thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10


example:

data have;

 

input a b c;

  cards;

  1 2 3

  3 2 1

  1 2 3

  ;

  proc sql;

  create table want as

   select distinct * from have;

  quit;

  proc print ;run;

View solution in original post

4 REPLIES 4
Linlin
Lapis Lazuli | Level 10


example:

data have;

 

input a b c;

  cards;

  1 2 3

  3 2 1

  1 2 3

  ;

  proc sql;

  create table want as

   select distinct * from have;

  quit;

  proc print ;run;

Haikuo
Onyx | Level 15

Use nodupkey in proc sort:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

proc sort data=have out=want nodupkey;

by firstname middle lastname ID age school city;

run;

proc print;run;

07:05 Friday, April 20, 2012 67

  Obs firstname middle lastname ID age school city

  1 Deb R Popular 12 13 xyz NY

  2 Debbie R Popular 12 13 xyz NY

  3 Debbie R Popular 21 13 xyz NY

Haikuo

Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent.

Haikuo
Onyx | Level 15

Or just using data step:

data have;

input (firstname middle lastname ID) (:$10.) age (school city) (:$10.);

cards;

Debbie R Popular 12 13 xyz NY

Debbie R Popular 21 13 xyz NY

Debbie R Popular 12 13 xyz NY

Deb R Popular 12 13 xyz NY

;

data _null_;

if 0 then set have;

  dcl hash h(dataset:'have');

  h.definekey(all:'y');

  h.definedata(all:'y');

  h.definedone();

  h.output(dataset: 'want');

run;

proc print;run;

07:05 Friday, April 20, 2012 68

  Obs firstname middle lastname ID age school city

  1 Debbie R Popular 12 13 xyz NY

  2 Deb R Popular 12 13 xyz NY

  3 Debbie R Popular 21 13 xyz NY

Haikuo

MikeZdeb
Rhodochrosite | Level 12

hi ...

re:  "Update: please be aware that in this case, 'nodup' option will not work, as the duplicated records are not adjacent"

if you sort by all the variables, NODUP and NODUPKEY give the same result

also, if you want to sort by all the variables, you can use _ALL_  ...

proc sort data=have out=want nodup;

by _all_;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1160 views
  • 8 likes
  • 4 in conversation