BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SarahW13
Obsidian | Level 7

I have a question about how to identify id values that meet certain criteria.

 

Below is a sample of the data:

 

id        test

1         A

1         B

2         B

3         A

3         B

3         B

4         A

4         B

5         A

 

I am trying to figure out the code that would allow me to identify the ids that have both A and B as test values.

So in this example, I want to identify the following ids:

id

1

3

4

 

Any advice?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
LaurieF
Barite | Level 11

This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead

 

data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
      test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;

proc sort data=start noequals;
by id;
run;

data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then 
   call missing(test_a, test_b);
select(test);
   when('A') test_a = 'Y';
   when('B') test_b = 'Y';
   otherwise;
   end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;

View solution in original post

6 REPLIES 6
LaurieF
Barite | Level 11

This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead

 

data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
      test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;

proc sort data=start noequals;
by id;
run;

data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then 
   call missing(test_a, test_b);
select(test);
   when('A') test_a = 'Y';
   when('B') test_b = 'Y';
   otherwise;
   end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;
hashman
Ammonite | Level 13

@SarahW13 :

Assuming that your input file is sorted by ID:

data have ;                    
  input id test:$1. ;          
  cards ;                      
1  A                           
1  B                           
2  B                           
3  A                           
3  B                           
3  B                           
4  A                           
4  B                           
5  A                           
run ;                          
                               
data want (keep = id) ;        
  do until (last.id) ;         
    set have ;                 
    by id ;                    
    if test = "A" then _A = 1 ;
    if test = "B" then _B = 1 ;
  end ;                        
  if _A and _B ;               
run ;                          

Or if your input file isn't sorted and/or you prefer SQL:

proc sql ;                                                          
  create table want as select unique id from have                          
  where  test = "A" and id in (select id from have where test = "B")
  ;                                                                 
quit ;                                                              

"A" and "B" in the WHERE clause are mutually interchangeable.

 

Kind regards

Paul D.

andreas_lds
Jade | Level 19
If the variable "test" has only the values A and B,you could try the following untested code:

proc sort data=have out=sorted nodupkey;
by id test;
run;

data want;
set sorted;
by id;
if first.id and not last.id;
keep id;
run;

Written on my way to work.
ghosh
Barite | Level 11

 

data have;
input id test $;
cards;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
;

data new;
  merge have (in=a where = (test="A")) have (in=b where = (test="B"));
  by id;
  if a and b;
run;



PROC SQL; CREATE TABLE want AS SELECT DISTINCT id FROM new ; QUIT;

 

hashman
Ammonite | Level 13

@ghosh :

If you end up using SQL anyway, why precede it with the extra merge step with the assumption that the input file is sorted by ID when a single SQL step can produce the result without making the assumption, e.g.:

data have ;                              
  input id test:$1. ;                    
  cards ;                                
1  A                                     
1  B                                     
2  B                                     
3  A                                     
3  B                                     
3  B                                     
4  A                                     
4  B                                     
5  A                                     
run ;                                    
                                         
proc sql ;                               
  create table want as select unique A.id
  from   have (where = (test = "A")) A   
       , have (where = (test = "B")) B   
  where  A.id = B.id                     
  ;                                      
quit ;                                   

Of course, this is merely a join alternative to the subquery I've offered earlier. On the other hand, if the input file is sorted by ID, SQL isn't needed, as a single DATA step with BY will suffice and be more efficient to boot.

 

Kind regards

Paul D. 

 

Ksharp
Super User
data have ;                    
  input id test:$1. ;          
  cards ;                      
1  A                           
1  B                           
2  B                           
3  A                           
3  B                           
3  B                           
4  A                           
4  B                           
5  A         
; 
run ;     

proc sql;
select distinct id
 from have
  group by id
   having max(test='A') and max(test='B');
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2391 views
  • 7 likes
  • 6 in conversation