BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

I've the table as shown below. If I ignore the 'Serial_Num' field then there are few duplicate records. Now I want to identify the 'Serial_Num' of those duplicate records. Any help to achieve this using SQL?

 

Serial_Num Name Age Salary Gender
1 David 34 1000 M
2 David 34 1000 M
3 David 34 1000 M
4 Anu 36 900 F
5 Deepak 24 200 M
6 Deepak 24 200 M
7 James 44 1200 M
8 Anna 37 1000 F

 

Desired Output;

 

Serial_Num Name Age Salary Gender
1 David 34 1000 M
2 David 34 1000 M
3 David 34 1000 M
5 Deepak 24 200 M
6 Deepak 24 200 M

 

7 REPLIES 7
MayurJadhav
Quartz | Level 8

You can use group by and having clause in proc sql.

@David_Billa 

 

proc sql;
create table output_table as select * from input_table
group by Name, Age, Salary, Gender having count(*)>=2;
quit;
proc print data=output_table;
run;
Mayur Jadhav
BI Developer. Writer. Creative Educator.

SAS Blog → https://learnsascode.com
YouTube Channel: → https://www.youtube.com/@imayurj
David_Billa
Rhodochrosite | Level 12

@MayurJadhav  I've got the error

 

values "Serial_Num' in select clause is neither an aggregate nor in the group by clause
Tom
Super User Tom
Super User

@David_Billa wrote:

@MayurJadhav  I've got the error

 

values "Serial_Num' in select clause is neither an aggregate nor in the group by clause

Are you using PROC SQL?  Or are you running the SQL code in some foreign database?

 

Show the SAS log for the query that is failing.

David_Billa
Rhodochrosite | Level 12
Yes, I have executed the query in Snowflake.

I believe that we will get the similar error if we execute the similar
query in SAS.
Kurt_Bremser
Super User

@David_Billa wrote:
Yes, I have executed the query in Snowflake.

I believe that we will get the similar error if we execute the similar
query in SAS.

No, you won't. You'll get a NOTE about the automatic remerge, which is the purpose of the suggested code.

David_Billa
Rhodochrosite | Level 12
That's good. Any hint how to handle this query in external databases like
Snowflake?
Kurt_Bremser
Super User

Something like

create table output_table as
  select t1.* 
  from input_table as t1
  inner join (
    select Name, Age, Salary, Gender
    from input_table
    group by Name, Age, Salary, Gender
    having count(*) >= 2
  ) as t2
  on t1.name = t2.name and t1.age = t2.age and t1.salary = t2.salary and t1.gender = t2.gender

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 343 views
  • 4 likes
  • 4 in conversation