BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

I have a data set that includes one field.

Task is to find if there are any 2 (or more rows) with same value. 

 

Data ttbl;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

/*Task: Find if in the table there are 2 lines with same values*/
1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Alternatively try proc sql as below

 

/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

proc sql;
create table want as select count(x) as count, x from t1 group by x;
quit;

 

image.png

Thanks,
Jag

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

Do something like this and save the duplicates in a data set.

 

proc sort data=ttbl out=ttbl_2 dupout=Duplicates;
    by x;
run;
BrahmanandaRao
Lapis Lazuli | Level 10
/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

proc sort data=t1  ;		/* if we use nodupkey in proc sort we didn't get proper output*/
by x;
run;

data unique duplicate;
set t1;
by x;
if first.x and last.x then output unique;
else output	duplicate;
run;

/*Using Procedure*/
proc freq data=t1 ;
tables x /nocum nocol norow nopercent;
run;

gamotte
Rhodochrosite | Level 12

Hello,

 

A solution with sql :

 

proc sql noprint;
    CREATE TABLE duplicates AS 
    SELECT *, count(*) AS ndupl
    FROM ttbl
    GROUP BY x
    HAVING count(*)>1
    ;
quit;
Jagadishkatam
Amethyst | Level 16

Alternatively try proc sql as below

 

/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

proc sql;
create table want as select count(x) as count, x from t1 group by x;
quit;

 

image.png

Thanks,
Jag
Ksharp
Super User

Just for fun. 

If there are some error info in LOG , then table has duplicated value.

ERROR: Duplicate key found when loading data set ttbl at line 40 column 2.
Data ttbl;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

data _null_;
 if 0 then set ttbl	;
 declare hash h(dataset:'ttbl',duplicate:'e');
 h.definekey(all:'y');
 h.definedone();
 stop;
run;

 

BrahmanandaRao
Lapis Lazuli | Level 10

Hi sharp

why it shows in error in log 

 

hash table cannot find duplicate values hash table only joins two tables please explain with code

Ksharp
Super User

I made an parameter 

duplicate:'e'

in Hash Table .

It will issue an error info if there are   duplicate values .

 

Actually Hash table could tell you which one is duplicated , but need more code ,I think you don't need it .

novinosrin
Tourmaline | Level 20

Hi @Ronein   Proc Freq would generally be the natural inclination for most people considering it's easy

 



Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

/*Count greater than 1*/

proc freq data=t1 noprint;
tables x/out=want(keep=x count where=(count>1));
run;
/*Or */
/*Count all*/

proc freq data=t1 noprint;
tables x/out=want(keep=x count);
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1973 views
  • 6 likes
  • 7 in conversation