BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ruslan
Calcite | Level 5

Dear All,

My data looks as follows:

A        B        C       D  

1        4        34      5

1        4        25      8

1        4        65      2

1        4        56      8

1        3        32      9

1        3        30      0

2        5        56      10

2        5        35      11

2        5        78      13

2        5        60      12

2        5        28      16

2        5        26      18

3        7        40      19

3        7        50      22

3        7        39      20

3        7        25      11

4        9        31       22

4        9        37       27

5        9        40       21

5        9        62      25

What I need to do is to remove observations where there 2 distinct values in column B (in data above, these values are 4 and 3) for 1 distinct value (value = 1) in column A. Additionally, there are some reverse cases where for 1 value in B (value = 9) there are two values in A (values are 4 and 5). Thus, in the example provided above, I need to remove first 6 observations (where unique value in column A = 1) and last 4 observations (where unique value in column B = 9) because there are 2 distinct values for 1 distinct value in either column A or B. First six and last four observations to be removed is just an example - I can have 2 distinct values for 1 distinct value somewhere in the middle of the sample.

Any help would be hugely appreciated. I look forward to hearing from you soon.

Kind regards,

Ruslan

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Then I believe we have a simple fix just by tweaking the existing code:

proc sql;

     create table want as select * from

(select * from have group by a having count(distinct b) < 2 )

intersect

(select * from have group by b having count(distinct a) < 2 );

quit;


Good Luck!


Haikuo

View solution in original post

9 REPLIES 9
Reeza
Super User

So for the above data what would your output look like?

Ruslan
Calcite | Level 5

My desired output should look like this:

A        B        C       D  

2        5        56      10

2        5        35      11

2        5        78      13

2        5        60      12

2        5        28      16

2        5        26      18

3        7        40      19

3        7        50      22

3        7        39      20

3        7        25      11

I look forward to getting your reply.

Ruslan
Calcite | Level 5

Reeza, please help me. It is slightly urgent.

Thank you very much in advance!!!

Haikuo
Onyx | Level 15

Well, giving your 'slightly urgent' situation, Reeza wouldn't mind my chipping in. There can be many options which are stronger on some aspects and weaker on others (the difficulty of coding, robustness, performance etc), the following is just an appetizer:

data have;

     input A        B C       D;

     cards;

1        4 34      5

1        4 25      8

1        4 65      2

1        4 56      8

1        3 32      9

1        3 30      0

2        5 56      10

2        5 35      11

2        5 78      13

2        5 60      12

2        5 28      16

2        5 26      18

3        7 40      19

3        7 50      22

3        7 39      20

3        7 25      11

4        9 31       22

4        9 37       27

5        9 40       21

5        9 62      25

;

proc sql;

     create table want as select * from

(select * from have group by a having count(distinct b) ne 2 )

intersect

(select * from have group by b having count(distinct a) ne 2 );

quit;

Ruslan
Calcite | Level 5

Thank you very much for your reply.

This code works perfectly, however how can I make it work for general cases (for example, when there are 3 distinct values in A for 1 value in B)?

What if my data looks like this:

data have;

     input A        B C       D;

     cards;

1        4 34      5

1        4 25      8

1        10 65      2

1        10 56      8

1        3 32      9

1        3 30      0

2        5 56      10

2        5 35      11

2        5 78      13

2        5 60      12

2        5 28      16

2        5 26      18

3        7 40      19

3        7 50      22

3        7 39      20

3        7 25      11

4        9 31       22

4        9 37       27

5        9 40       21

5        9 62      25

;

How should your code be rewritten? I do appreciate your help and look forward to getting your reply.

Haikuo
Onyx | Level 15

You didn't mention anything about 3 or more distinct values until now, well, not even now, as you still haven't mention what do you want with them. do you keep them or do you want to get rid of them? Please do not assume that people can read your mind, we can't.

Ruslan
Calcite | Level 5

Sorry for not making everything clear. So, please let me state my problem again.

I have data like this:

data have;

     input A        B C       D;

     cards;

1        4 34      5

1        4 25      8

1        10 65      2

1        10 56      8

1        3 32      9

1        3 30      0

2        5 56      10

2        5 35      11

2        5 78      13

2        5 60      12

2        5 28      16

2        5 26      18

3        7 40      19

3        7 50      22

3        7 39      20

3        7 25      11

4        9 31       22

4        9 37       27

5        9 40       21

5        9 62      25

;


And I need to get this:


A        B        C       D  

2        5        56      10

2        5        35      11

2        5        78      13

2        5        60      12

2        5        28      16

2        5        26      18

3        7        40      19

3        7        50      22

3        7        39      20

3        7        25      11

So I just want to remove observations where for one distinct value in column A (value=1) there are two or more distinct values in column B (values = 4 10 3) and remove observations where for one distinct value in column B (value=9) there are two or more distinct values in column A (values = 4 5).

I look forward to getting your reply and I do apologize again for bad thoughts expression 🙂


Haikuo
Onyx | Level 15

Then I believe we have a simple fix just by tweaking the existing code:

proc sql;

     create table want as select * from

(select * from have group by a having count(distinct b) < 2 )

intersect

(select * from have group by b having count(distinct a) < 2 );

quit;


Good Luck!


Haikuo

Ruslan
Calcite | Level 5

Works fantastic!! Thanks a lot!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 955 views
  • 4 likes
  • 3 in conversation