Is there any efficient solution to delete all duplicate values?

Solved
Occasional Contributor
Posts: 12

Is there any efficient solution to delete all duplicate values?

Hi all,

during my work i came to a question that bothers me a lot.

Assuming that i have a table with dupllicate values like:

Table:

Col1       Col2

123          456

123          456

456          789

456          888

Now i would like to delete all values from Table that are completely identical considering all columns. (vIts very important that all columns are viewd of duplictes and not only col1)

How do i do this in an efficient way? I got an solution that consist of miles of code....its not efficient, simple and good in any way.

I mean if i use a proc sort with no duprecs, sas deletes only 1 to n duplicate values and not all.

Considering the table above wirh no duprecs or a control break (first. last. operator) there still will be one of those n duplicate values.

Is there any efficient solution to my qustion or do i have to any poor solution by myself?

Hope someone can help.

Thanks and regards

s

Accepted Solutions
Solution
‎05-16-2014 06:02 AM
Super User
Posts: 8,798

Re: Is there any efficient solution to delete all duplicate values?

Ok, this should do the job:

data have;
attrib Col1 Col2 format=best.;
infile datalines;
input col1 col2;
datalines;
123          456
123          456
456          789
456          888
;
run;

proc sql;
create table inter as
select  col1,
col2,
tmp
from    (select *,count(*) as tmp from have group by col1,col2)
where   tmp=1;
quit;

All Replies
Super User
Posts: 8,798

Re: Is there any efficient solution to delete all duplicate values?

Sorry, what should be the output from that data you have given, should it be two records the 456 ones?  If so proc sort should handle that fine.  Or do you mean where a value appears in any column?

Occasional Contributor
Posts: 12

Re: Is there any efficient solution to delete all duplicate values?

The output should be both  456 values, yes but none 123 values.

If i use a proc sort i get this result:

col1    col2

123    456

456    789

456    888

but i want:

col1    col2

456    789

456    888

is there any other proc sort option i didnt thought about?

Solution
‎05-16-2014 06:02 AM
Super User
Posts: 8,798

Re: Is there any efficient solution to delete all duplicate values?

Ok, this should do the job:

data have;
attrib Col1 Col2 format=best.;
infile datalines;
input col1 col2;
datalines;
123          456
123          456
456          789
456          888
;
run;

proc sql;
create table inter as
select  col1,
col2,
tmp
from    (select *,count(*) as tmp from have group by col1,col2)
where   tmp=1;
quit;

Occasional Contributor
Posts: 12

Re: Is there any efficient solution to delete all duplicate values?

yaaay thanks!

this solution is far more short than the one i could've used!

nice, thanks!

Super User
Posts: 8,929

Re: Is there any efficient solution to delete all duplicate values?

proc sql;

create table want as select distinct * from have;

quit;run;

or

proc sort

data=want

out=have

nodupkey

;

by _all_;

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 282

Re: Is there any efficient solution to delete all duplicate values?

Hi,

data want;

set have;

by col1 col2;

if first.col2 and last.col2;

run;

Regards,

Amir.

🔒 This topic is solved and locked.