Solved: Re: Is there any efficient solution to delete all duplicate values?

asdf_sas · Posted 05-16-2014 05:25 AM

Hi all,

during my work i came to a question that bothers me a lot.

Assuming that i have a table with dupllicate values like:

Table:

Col1 Col2

123 456

456 789

456 888

Now i would like to delete all values from Table that are completely identical considering all columns. (vIts very important that all columns are viewd of duplictes and not only col1)

How do i do this in an efficient way? I got an solution that consist of miles of code....its not efficient, simple and good in any way.

I mean if i use a proc sort with no duprecs, sas deletes only 1 to n duplicate values and not all.

Considering the table above wirh no duprecs or a control break (first. last. operator) there still will be one of those n duplicate values.

Is there any efficient solution to my qustion or do i have to any poor solution by myself?

Hope someone can help.

Thanks and regards

s

RW9 · Posted 05-16-2014 06:02 AM

Ok, this should do the job:

data have;
attrib Col1 Col2 format=best.;
infile datalines;
input col1 col2;
datalines;
123          456
123          456
456          789
456          888
;
run;

proc sql;
create table inter as
select col1,
          col2,
          tmp
from    (select *,count(*) as tmp from have group by col1,col2)
where   tmp=1;
quit;

View solution in original post

RW9 · Posted 05-16-2014 05:35 AM

Sorry, what should be the output from that data you have given, should it be two records the 456 ones? If so proc sort should handle that fine. Or do you mean where a value appears in any column?

asdf_sas · Posted 05-16-2014 05:50 AM

The output should be both 456 values, yes but none 123 values.

If i use a proc sort i get this result:

col1 col2

123 456

456 789

456 888

but i want:

col1 col2

456 789

456 888

is there any other proc sort option i didnt thought about?

RW9 · Posted 05-16-2014 06:02 AM

Ok, this should do the job:

data have;
attrib Col1 Col2 format=best.;
infile datalines;
input col1 col2;
datalines;
123          456
123          456
456          789
456          888
;
run;

proc sql;
create table inter as
select col1,
          col2,
          tmp
from    (select *,count(*) as tmp from have group by col1,col2)
where   tmp=1;
quit;

asdf_sas · Posted 05-16-2014 06:05 AM

yaaay thanks!

this solution is far more short than the one i could've used!

nice, thanks!

Kurt_Bremser · Posted 05-16-2014 05:47 AM

proc sql;

create table want as select distinct * from have;

quit;run;

or

proc sort

data=want

out=have

nodupkey

;

by _all_;

run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Amir · Posted 05-16-2014 07:42 AM

Hi,

If your data is already sorted as in your example then you could also try:

data want;

set have;

by col1 col2;

if first.col2 and last.col2;

run;

Regards,

Amir.

Registration is open