BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I have a data set that includes one field.

Task is to find if there are any 2 (or more rows) with same value. 

 

Data ttbl;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

/*Task: Find if in the table there are 2 lines with same values*/
1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Alternatively try proc sql as below

 

/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

proc sql;
create table want as select count(x) as count, x from t1 group by x;
quit;

 

image.png

Thanks,
Jag

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

Do something like this and save the duplicates in a data set.

 

proc sort data=ttbl out=ttbl_2 dupout=Duplicates;
    by x;
run;
BrahmanandaRao
Lapis Lazuli | Level 10
/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

proc sort data=t1  ;		/* if we use nodupkey in proc sort we didn't get proper output*/
by x;
run;

data unique duplicate;
set t1;
by x;
if first.x and last.x then output unique;
else output	duplicate;
run;

/*Using Procedure*/
proc freq data=t1 ;
tables x /nocum nocol norow nopercent;
run;

gamotte
Rhodochrosite | Level 12

Hello,

 

A solution with sql :

 

proc sql noprint;
    CREATE TABLE duplicates AS 
    SELECT *, count(*) AS ndupl
    FROM ttbl
    GROUP BY x
    HAVING count(*)>1
    ;
quit;
Jagadishkatam
Amethyst | Level 16

Alternatively try proc sql as below

 

/*Using datastep*/
Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

proc sql;
create table want as select count(x) as count, x from t1 group by x;
quit;

 

image.png

Thanks,
Jag
Ksharp
Super User

Just for fun. 

If there are some error info in LOG , then table has duplicated value.

ERROR: Duplicate key found when loading data set ttbl at line 40 column 2.
Data ttbl;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

data _null_;
 if 0 then set ttbl	;
 declare hash h(dataset:'ttbl',duplicate:'e');
 h.definekey(all:'y');
 h.definedone();
 stop;
run;

 

BrahmanandaRao
Lapis Lazuli | Level 10

Hi sharp

why it shows in error in log 

 

hash table cannot find duplicate values hash table only joins two tables please explain with code

Ksharp
Super User

I made an parameter 

duplicate:'e'

in Hash Table .

It will issue an error info if there are   duplicate values .

 

Actually Hash table could tell you which one is duplicated , but need more code ,I think you don't need it .

novinosrin
Tourmaline | Level 20

Hi @Ronein   Proc Freq would generally be the natural inclination for most people considering it's easy

 



Data t1;
input x;
cards;
23455
44566
77788
55444
89899
55555
55444
34456
;
Run;

/*Count greater than 1*/

proc freq data=t1 noprint;
tables x/out=want(keep=x count where=(count>1));
run;
/*Or */
/*Count all*/

proc freq data=t1 noprint;
tables x/out=want(keep=x count);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 758 views
  • 6 likes
  • 7 in conversation