Hello
I have a data set that includes one field.
Task is to find if there are any 2 (or more rows) with same value.
Data ttbl;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;
/*Task: Find if in the table there are 2 lines with same values*/
Alternatively try proc sql as below
/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;
proc sql;
create table want as select count(x) as count, x from t1 group by x;
quit;
Do something like this and save the duplicates in a data set.
proc sort data=ttbl out=ttbl_2 dupout=Duplicates;
by x;
run;
/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;
proc sort data=t1 ; /* if we use nodupkey in proc sort we didn't get proper output*/
by x;
run;
data unique duplicate;
set t1;
by x;
if first.x and last.x then output unique;
else output duplicate;
run;
/*Using Procedure*/
proc freq data=t1 ;
tables x /nocum nocol norow nopercent;
run;
Hello,
A solution with sql :
proc sql noprint;
CREATE TABLE duplicates AS
SELECT *, count(*) AS ndupl
FROM ttbl
GROUP BY x
HAVING count(*)>1
;
quit;
Alternatively try proc sql as below
/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;
proc sql;
create table want as select count(x) as count, x from t1 group by x;
quit;
Just for fun.
If there are some error info in LOG , then table has duplicated value.
ERROR: Duplicate key found when loading data set ttbl at line 40 column 2.
Data ttbl;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;
data _null_;
if 0 then set ttbl ;
declare hash h(dataset:'ttbl',duplicate:'e');
h.definekey(all:'y');
h.definedone();
stop;
run;
Hi sharp
why it shows in error in log
hash table cannot find duplicate values hash table only joins two tables please explain with code
I made an parameter
duplicate:'e'
in Hash Table .
It will issue an error info if there are duplicate values .
Actually Hash table could tell you which one is duplicated , but need more code ,I think you don't need it .
Hi @Ronein Proc Freq would generally be the natural inclination for most people considering it's easy
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;
/*Count greater than 1*/
proc freq data=t1 noprint;
tables x/out=want(keep=x count where=(count>1));
run;
/*Or */
/*Count all*/
proc freq data=t1 noprint;
tables x/out=want(keep=x count);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.