Hi all,
during my work i came to a question that bothers me a lot.
Assuming that i have a table with dupllicate values like:
Table:
Col1 Col2
123 456
123 456
456 789
456 888
Now i would like to delete all values from Table that are completely identical considering all columns. (vIts very important that all columns are viewd of duplictes and not only col1)
How do i do this in an efficient way? I got an solution that consist of miles of code....its not efficient, simple and good in any way.
I mean if i use a proc sort with no duprecs, sas deletes only 1 to n duplicate values and not all.
Considering the table above wirh no duprecs or a control break (first. last. operator) there still will be one of those n duplicate values.
Is there any efficient solution to my qustion or do i have to any poor solution by myself?
Hope someone can help.
Thanks and regards
s
Ok, this should do the job:
data have;
attrib Col1 Col2 format=best.;
infile datalines;
input col1 col2;
datalines;
123 456
123 456
456 789
456 888
;
run;
proc sql;
create table inter as
select col1,
col2,
tmp
from (select *,count(*) as tmp from have group by col1,col2)
where tmp=1;
quit;
Sorry, what should be the output from that data you have given, should it be two records the 456 ones? If so proc sort should handle that fine. Or do you mean where a value appears in any column?
The output should be both 456 values, yes but none 123 values.
If i use a proc sort i get this result:
col1 col2
123 456
456 789
456 888
but i want:
col1 col2
456 789
456 888
is there any other proc sort option i didnt thought about?
Ok, this should do the job:
data have;
attrib Col1 Col2 format=best.;
infile datalines;
input col1 col2;
datalines;
123 456
123 456
456 789
456 888
;
run;
proc sql;
create table inter as
select col1,
col2,
tmp
from (select *,count(*) as tmp from have group by col1,col2)
where tmp=1;
quit;
yaaay thanks!
this solution is far more short than the one i could've used!
nice, thanks!
proc sql;
create table want as select distinct * from have;
quit;run;
or
proc sort
data=want
out=have
nodupkey
;
by _all_;
run;
Hi,
If your data is already sorted as in your example then you could also try:
data want;
set have;
by col1 col2;
if first.col2 and last.col2;
run;
Regards,
Amir.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.