DATA Step, Macro, Functions and more

Help in removing Duplicates

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 94
Accepted Solution

Help in removing Duplicates


I have a data like this.

ID Name Amount
1 Pink(1) $50-----This is Duplicate
1 Pink(2) $50-----This is Duplicate
2 Pink(1) $20
2 Pink(1) $25
3 Pink(1) $10
4 Pink(1) $15
5 Pink(1) $20----This is not duplicate
5 Pink(1) $20 ---This is not duplicate

I am trying to remove the duplicate (Id) 1 which have Pink 1 and 2 but with same amount $50.
Even though the ID 5 seems to be same , that not duplicate.

Pink have 2 categories 1 and 2. If 1 or 2 have same value then its duplciate.Name is a char variable.

Thanks

 


Accepted Solutions
Solution
‎09-24-2017 11:45 PM
Super User
Posts: 10,850

Re: Help in removing Duplicates

Posted in reply to Kalai2008
data have;
input ID Name $ Amount $;
cards;
1 Pink(1) $50
1 Pink(2) $50
2 Pink(1) $20
2 Pink(1) $25
3 Pink(1) $10
4 Pink(1) $15
5 Pink(1) $20
5 Pink(1) $20
;
run;
proc sql;
create table a as
 select * 
  from have 
   group by id
    having count(distinct name) ne 1;

create table b as
 select * 
  from have 
   group by id
    having count(distinct name) eq 1;
quit;

data aa;
 set a;
 by id Amount;
 if first.Amount;
run;
data want;
 set aa b;
run;

View solution in original post


All Replies
Super User
Posts: 24,026

Re: Help in removing Duplicates

Posted in reply to Kalai2008

Use SCAN to pull out that 'second' variable 1/2 that's combined with your name.

Then use PROC SORT to order the data.

 

You'll likely need a data step instead and use FIRST/LAST processing.

 

Post what you've tried if you're still having issues.

Solution
‎09-24-2017 11:45 PM
Super User
Posts: 10,850

Re: Help in removing Duplicates

Posted in reply to Kalai2008
data have;
input ID Name $ Amount $;
cards;
1 Pink(1) $50
1 Pink(2) $50
2 Pink(1) $20
2 Pink(1) $25
3 Pink(1) $10
4 Pink(1) $15
5 Pink(1) $20
5 Pink(1) $20
;
run;
proc sql;
create table a as
 select * 
  from have 
   group by id
    having count(distinct name) ne 1;

create table b as
 select * 
  from have 
   group by id
    having count(distinct name) eq 1;
quit;

data aa;
 set a;
 by id Amount;
 if first.Amount;
run;
data want;
 set aa b;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 188 views
  • 1 like
  • 3 in conversation