Remove duplicate values

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

Remove duplicate values

Dear All,

My data looks as follows:

A        B        C       D  

1        4        34      5

1        4        25      8

1        4        65      2

1        4        56      8

1        3        32      9

1        3        30      0

2        5        56      10

2        5        35      11

2        5        78      13

2        5        60      12

2        5        28      16

2        5        26      18

3        7        40      19

3        7        50      22

3        7        39      20

3        7        25      11

4        9        31       22

4        9        37       27

5        9        40       21

5        9        62      25

What I need to do is to remove observations where there 2 distinct values in column B (in data above, these values are 4 and 3) for 1 distinct value (value = 1) in column A. Additionally, there are some reverse cases where for 1 value in B (value = 9) there are two values in A (values are 4 and 5). Thus, in the example provided above, I need to remove first 6 observations (where unique value in column A = 1) and last 4 observations (where unique value in column B = 9) because there are 2 distinct values for 1 distinct value in either column A or B. First six and last four observations to be removed is just an example - I can have 2 distinct values for 1 distinct value somewhere in the middle of the sample.

Any help would be hugely appreciated. I look forward to hearing from you soon.

Kind regards,

Ruslan


Accepted Solutions
Solution
‎02-22-2015 05:42 PM
Respected Advisor
Posts: 3,156

Re: Remove duplicate values

Then I believe we have a simple fix just by tweaking the existing code:

proc sql;

     create table want as select * from

(select * from have group by a having count(distinct b) < 2 )

intersect

(select * from have group by b having count(distinct a) < 2 );

quit;


Good Luck!


Haikuo

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Remove duplicate values

So for the above data what would your output look like?

Contributor
Posts: 71

Re: Remove duplicate values

My desired output should look like this:

A        B        C       D  

2        5        56      10

2        5        35      11

2        5        78      13

2        5        60      12

2        5        28      16

2        5        26      18

3        7        40      19

3        7        50      22

3        7        39      20

3        7        25      11

I look forward to getting your reply.

Contributor
Posts: 71

Re: Remove duplicate values

Reeza, please help me. It is slightly urgent.

Thank you very much in advance!!!

Respected Advisor
Posts: 3,156

Re: Remove duplicate values

Well, giving your 'slightly urgent' situation, Reeza wouldn't mind my chipping in. There can be many options which are stronger on some aspects and weaker on others (the difficulty of coding, robustness, performance etc), the following is just an appetizer:

data have;

     input A        B C       D;

     cards;

1        4 34      5

1        4 25      8

1        4 65      2

1        4 56      8

1        3 32      9

1        3 30      0

2        5 56      10

2        5 35      11

2        5 78      13

2        5 60      12

2        5 28      16

2        5 26      18

3        7 40      19

3        7 50      22

3        7 39      20

3        7 25      11

4        9 31       22

4        9 37       27

5        9 40       21

5        9 62      25

;

proc sql;

     create table want as select * from

(select * from have group by a having count(distinct b) ne 2 )

intersect

(select * from have group by b having count(distinct a) ne 2 );

quit;

Contributor
Posts: 71

Re: Remove duplicate values

Thank you very much for your reply.

This code works perfectly, however how can I make it work for general cases (for example, when there are 3 distinct values in A for 1 value in B)?

What if my data looks like this:

data have;

     input A        B C       D;

     cards;

1        4 34      5

1        4 25      8

1        10 65      2

1        10 56      8

1        3 32      9

1        3 30      0

2        5 56      10

2        5 35      11

2        5 78      13

2        5 60      12

2        5 28      16

2        5 26      18

3        7 40      19

3        7 50      22

3        7 39      20

3        7 25      11

4        9 31       22

4        9 37       27

5        9 40       21

5        9 62      25

;

How should your code be rewritten? I do appreciate your help and look forward to getting your reply.

Respected Advisor
Posts: 3,156

Re: Remove duplicate values

You didn't mention anything about 3 or more distinct values until now, well, not even now, as you still haven't mention what do you want with them. do you keep them or do you want to get rid of them? Please do not assume that people can read your mind, we can't.

Contributor
Posts: 71

Re: Remove duplicate values

Sorry for not making everything clear. So, please let me state my problem again.

I have data like this:

data have;

     input A        B C       D;

     cards;

1        4 34      5

1        4 25      8

1        10 65      2

1        10 56      8

1        3 32      9

1        3 30      0

2        5 56      10

2        5 35      11

2        5 78      13

2        5 60      12

2        5 28      16

2        5 26      18

3        7 40      19

3        7 50      22

3        7 39      20

3        7 25      11

4        9 31       22

4        9 37       27

5        9 40       21

5        9 62      25

;


And I need to get this:


A        B        C       D  

2        5        56      10

2        5        35      11

2        5        78      13

2        5        60      12

2        5        28      16

2        5        26      18

3        7        40      19

3        7        50      22

3        7        39      20

3        7        25      11

So I just want to remove observations where for one distinct value in column A (value=1) there are two or more distinct values in column B (values = 4 10 3) and remove observations where for one distinct value in column B (value=9) there are two or more distinct values in column A (values = 4 5).

I look forward to getting your reply and I do apologize again for bad thoughts expression :-)


Solution
‎02-22-2015 05:42 PM
Respected Advisor
Posts: 3,156

Re: Remove duplicate values

Then I believe we have a simple fix just by tweaking the existing code:

proc sql;

     create table want as select * from

(select * from have group by a having count(distinct b) < 2 )

intersect

(select * from have group by b having count(distinct a) < 2 );

quit;


Good Luck!


Haikuo

Contributor
Posts: 71

Re: Remove duplicate values

Works fantastic!! Thanks a lot!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 467 views
  • 4 likes
  • 3 in conversation