SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Is there any efficient solution to delete all duplicate values?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Is there any efficient solution to delete all duplicate values?

Hi all,

during my work i came to a question that bothers me a lot.

Assuming that i have a table with dupllicate values like:

Table:

Col1       Col2

123          456

123          456

456          789

456          888

Now i would like to delete all values from Table that are completely identical considering all columns. (vIts very important that all columns are viewd of duplictes and not only col1)

How do i do this in an efficient way? I got an solution that consist of miles of code....its not efficient, simple and good in any way.

I mean if i use a proc sort with no duprecs, sas deletes only 1 to n duplicate values and not all.

Considering the table above wirh no duprecs or a control break (first. last. operator) there still will be one of those n duplicate values.

Is there any efficient solution to my qustion or do i have to any poor solution by myself?

Hope someone can help.

Thanks and regards

s


Accepted Solutions
Solution
‎05-16-2014 06:02 AM
Super User
Super User
Posts: 7,401

Re: Is there any efficient solution to delete all duplicate values?

Ok, this should do the job:

data have;
  attrib Col1 Col2 format=best.;
  infile datalines;
  input col1 col2;
datalines;
123          456
123          456
456          789
456          888
;
run;

proc sql;
  create table inter as
  select  col1,
          col2,
          tmp
  from    (select *,count(*) as tmp from have group by col1,col2)
  where   tmp=1;
quit;
  

View solution in original post


All Replies
Super User
Super User
Posts: 7,401

Re: Is there any efficient solution to delete all duplicate values?

Sorry, what should be the output from that data you have given, should it be two records the 456 ones?  If so proc sort should handle that fine.  Or do you mean where a value appears in any column?

Occasional Contributor
Posts: 12

Re: Is there any efficient solution to delete all duplicate values?

The output should be both  456 values, yes but none 123 values.

If i use a proc sort i get this result:

col1    col2

123    456

456    789

456    888

but i want:

col1    col2

456    789

456    888

is there any other proc sort option i didnt thought about?

Solution
‎05-16-2014 06:02 AM
Super User
Super User
Posts: 7,401

Re: Is there any efficient solution to delete all duplicate values?

Ok, this should do the job:

data have;
  attrib Col1 Col2 format=best.;
  infile datalines;
  input col1 col2;
datalines;
123          456
123          456
456          789
456          888
;
run;

proc sql;
  create table inter as
  select  col1,
          col2,
          tmp
  from    (select *,count(*) as tmp from have group by col1,col2)
  where   tmp=1;
quit;
  

Occasional Contributor
Posts: 12

Re: Is there any efficient solution to delete all duplicate values?

yaaay thanks!

this solution is far more short than the one i could've used!

nice, thanks!

Super User
Posts: 6,938

Re: Is there any efficient solution to delete all duplicate values?

proc sql;

create table want as select distinct * from have;

quit;run;

or

proc sort

  data=want

  out=have

  nodupkey

;

by _all_;

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 282

Re: Is there any efficient solution to delete all duplicate values?

Hi,

If your data is already sorted as in your example then you could also try:

data want;

  set have;

  by col1 col2;

  if first.col2 and last.col2;

run;

Regards,

Amir.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 488 views
  • 0 likes
  • 4 in conversation