Hi,
Is PROC SORT nodupkey or nodup the best way to check for duplicates or there a better way to quickly check to see if a variable has duplicated values in a dataset?
Thanks
you can use in memory hash table to read data and print "error" on the first duplicate:
data have;
input x $1. @@;
if x ne " ";
cards;
qwertyuiopasdfghjklzxcvbnm1234567890q
;
run;
proc print;
run;
/* test for dups */
data _null_;
declare hash H();
H.defineKey("x");
H.defineDone();
do until(eof);
set HAVE end=eof curobs=curobs;
rc=H.add();
if rc then
do;
put "ERROR: Duplicate value: " x "detected in observation " curobs;
stop;
end;
end;
stop;
run;
Bart
you can use in memory hash table to read data and print "error" on the first duplicate:
data have;
input x $1. @@;
if x ne " ";
cards;
qwertyuiopasdfghjklzxcvbnm1234567890q
;
run;
proc print;
run;
/* test for dups */
data _null_;
declare hash H();
H.defineKey("x");
H.defineDone();
do until(eof);
set HAVE end=eof curobs=curobs;
rc=H.add();
if rc then
do;
put "ERROR: Duplicate value: " x "detected in observation " curobs;
stop;
end;
end;
stop;
run;
Bart
Another hash-solution (using the data provided by @yabwon 😞
data _null_;
if 0 then set have;
declare hash h(dataset: 'have', duplicate: 'e');
h.defineKey('x');
h.defineDone();
stop;
run;
And cool thing is that it can be easily extended from only single variable check to row duplicates;
data have;
input x $1. @@;
if x ne " ";
y=rank(x);
z=y*10;
cards;
qwertyuiopasdfghjklzxcvbnm1234567890q
;
run;
proc print;
run;
data _null_;
if 0 then set have;
declare hash h(dataset: 'have', duplicate: 'e');
h.defineKey(all:'yes'); /* duplicated rows */
h.defineDone();
stop;
run;
Bart
Personally I find it useful to create macros for common tasks like this. It means you can get your answer with just one statement. It also means the underlying method isn't so important.
%macro Find_Dups ( dataset =
,byvar =
,dupvar =
);
%if &dupvar = %then %let dupvar = &byvar;
proc sort data = &dataset
out = sorted
;
by &byvar;
run;
data dups;
set sorted;
by &byvar;
if not (first.&dupvar and last.&dupvar);
run;
%mend Find_Dups;
Although you will notice that I prefer to create a table with the duplicate rows.
This thread seems to be devolving into a general discussion.
Much better to post new questions on new threads. You can always include a link to some older topic.
You can use the old -sysparm option.
https://documentation.sas.com/doc/en/mcrolref/3.2/p0ajr6rtdhuhzbn199hhpkak2v8p.htm
Or you can take advantage of the new -set option.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostunx/n106qouqj0hfk5n1wgqpw8iovxy2.htm
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.