DATA Step, Macro, Functions and more

How to delete duplicate rows using proc sql?

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

How to delete duplicate rows using proc sql?

Hi,

I wan to delete duplicate rows using proc sql (it is possible by proc sort with nodupkey)

For ex -  In the below example, i want to remove duplicates on the basis of name and age.

Input  -

id name  age company

1 aik 26 tcs

2 aik 29 infosys

3 bik 23 wns

4 bik 23 tcs

5 cik 30 infosys

6 cik 28 wns

Output -

id name  age company

1 aik 26 tcs

2 aik 29 infosys

3 bik 23 wns

5 cik 30 infosys

6 cik 28 wns


Accepted Solutions
Solution
‎02-13-2015 02:58 AM
Super User
Posts: 9,681

Re: How to delete duplicate rows using proc sql?

data have;
input id name $  age company $;
cards;
1 aik 26 tcs
2 aik 29 infosys
3 bik 23 wns
4 bik 23 tcs
5 cik 30 infosys
6 cik 28 wns
;
run;
proc sql;
 select *
  from have
   group by name,age
     having id=min(id);
quit;

Xia Keshan

View solution in original post


All Replies
Super User
Posts: 17,828

Re: How to delete duplicate rows using proc sql?

Use Select Distinct to get unique records.

proc sql;

create table want as

select distinct id, name, age, company

from have;

quit;

Contributor
Posts: 22

Re: How to delete duplicate rows using proc sql?

Hi Reeza,

Your query will create distinct rows, i am looking for distinct name and age combination only.

Thanks

Nikunj

Super User
Posts: 17,828

Re: How to delete duplicate rows using proc sql?

For duplicates do you care which company gets attached?

You can basically do a group by and summarize on those columns, a data step with first/last gives you more control.

proc sql;

create table want as

select min(id) as ID, name, age, min(company) as company

from have

group by name, age

order by id, name, age;

quit;

Solution
‎02-13-2015 02:58 AM
Super User
Posts: 9,681

Re: How to delete duplicate rows using proc sql?

data have;
input id name $  age company $;
cards;
1 aik 26 tcs
2 aik 29 infosys
3 bik 23 wns
4 bik 23 tcs
5 cik 30 infosys
6 cik 28 wns
;
run;
proc sql;
 select *
  from have
   group by name,age
     having id=min(id);
quit;

Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 366 views
  • 5 likes
  • 3 in conversation