BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SarahW13
Obsidian | Level 7

I have a question about how to identify id values that meet certain criteria.

 

Below is a sample of the data:

 

id        test

1         A

1         B

2         B

3         A

3         B

3         B

4         A

4         B

5         A

 

I am trying to figure out the code that would allow me to identify the ids that have both A and B as test values.

So in this example, I want to identify the following ids:

id

1

3

4

 

Any advice?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
LaurieF
Barite | Level 11

This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead

 

data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
      test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;

proc sort data=start noequals;
by id;
run;

data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then 
   call missing(test_a, test_b);
select(test);
   when('A') test_a = 'Y';
   when('B') test_b = 'Y';
   otherwise;
   end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;

View solution in original post

6 REPLIES 6
LaurieF
Barite | Level 11

This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead

 

data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
      test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;

proc sort data=start noequals;
by id;
run;

data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then 
   call missing(test_a, test_b);
select(test);
   when('A') test_a = 'Y';
   when('B') test_b = 'Y';
   otherwise;
   end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;
hashman
Ammonite | Level 13

@SarahW13 :

Assuming that your input file is sorted by ID:

data have ;                    
  input id test:$1. ;          
  cards ;                      
1  A                           
1  B                           
2  B                           
3  A                           
3  B                           
3  B                           
4  A                           
4  B                           
5  A                           
run ;                          
                               
data want (keep = id) ;        
  do until (last.id) ;         
    set have ;                 
    by id ;                    
    if test = "A" then _A = 1 ;
    if test = "B" then _B = 1 ;
  end ;                        
  if _A and _B ;               
run ;                          

Or if your input file isn't sorted and/or you prefer SQL:

proc sql ;                                                          
  create table want as select unique id from have                          
  where  test = "A" and id in (select id from have where test = "B")
  ;                                                                 
quit ;                                                              

"A" and "B" in the WHERE clause are mutually interchangeable.

 

Kind regards

Paul D.

andreas_lds
Jade | Level 19
If the variable "test" has only the values A and B,you could try the following untested code:

proc sort data=have out=sorted nodupkey;
by id test;
run;

data want;
set sorted;
by id;
if first.id and not last.id;
keep id;
run;

Written on my way to work.
ghosh
Barite | Level 11

 

data have;
input id test $;
cards;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
;

data new;
  merge have (in=a where = (test="A")) have (in=b where = (test="B"));
  by id;
  if a and b;
run;



PROC SQL; CREATE TABLE want AS SELECT DISTINCT id FROM new ; QUIT;

 

hashman
Ammonite | Level 13

@ghosh :

If you end up using SQL anyway, why precede it with the extra merge step with the assumption that the input file is sorted by ID when a single SQL step can produce the result without making the assumption, e.g.:

data have ;                              
  input id test:$1. ;                    
  cards ;                                
1  A                                     
1  B                                     
2  B                                     
3  A                                     
3  B                                     
3  B                                     
4  A                                     
4  B                                     
5  A                                     
run ;                                    
                                         
proc sql ;                               
  create table want as select unique A.id
  from   have (where = (test = "A")) A   
       , have (where = (test = "B")) B   
  where  A.id = B.id                     
  ;                                      
quit ;                                   

Of course, this is merely a join alternative to the subquery I've offered earlier. On the other hand, if the input file is sorted by ID, SQL isn't needed, as a single DATA step with BY will suffice and be more efficient to boot.

 

Kind regards

Paul D. 

 

Ksharp
Super User
data have ;                    
  input id test:$1. ;          
  cards ;                      
1  A                           
1  B                           
2  B                           
3  A                           
3  B                           
3  B                           
4  A                           
4  B                           
5  A         
; 
run ;     

proc sql;
select distinct id
 from have
  group by id
   having max(test='A') and max(test='B');
quit;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3382 views
  • 7 likes
  • 6 in conversation