Solved: Re: Identifying ids with two values

SarahW13 · Posted 06-30-2019 08:05 PM

I have a question about how to identify id values that meet certain criteria.

Below is a sample of the data:

id test

1 A

1 B

2 B

3 A

3 B

4 A

4 B

5 A

I am trying to figure out the code that would allow me to identify the ids that have both A and B as test values.

So in this example, I want to identify the following ids:

id

1

3

4

Any advice?

Thank you!

LaurieF · Posted 06-30-2019 11:07 PM

This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead

data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
      test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;

proc sort data=start noequals;
by id;
run;

data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then 
   call missing(test_a, test_b);
select(test);
   when('A') test_a = 'Y';
   when('B') test_b = 'Y';
   otherwise;
   end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;

View solution in original post

LaurieF · Posted 06-30-2019 11:07 PM

This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead

data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
      test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;

proc sort data=start noequals;
by id;
run;

data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then 
   call missing(test_a, test_b);
select(test);
   when('A') test_a = 'Y';
   when('B') test_b = 'Y';
   otherwise;
   end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;

hashman · Posted 07-01-2019 01:12 AM

@SarahW13 :

Assuming that your input file is sorted by ID:

data have ;                    
  input id test:$1. ;          
  cards ;                      
1  A                           
1  B                           
2  B                           
3  A                           
3  B                           
3  B                           
4  A                           
4  B                           
5  A                           
run ;                          
                               
data want (keep = id) ;        
  do until (last.id) ;         
    set have ;                 
    by id ;                    
    if test = "A" then _A = 1 ;
    if test = "B" then _B = 1 ;
  end ;                        
  if _A and _B ;               
run ;

Or if your input file isn't sorted and/or you prefer SQL:

proc sql ;                                                          
  create table want as select unique id from have                          
  where  test = "A" and id in (select id from have where test = "B")
  ;                                                                 
quit ;

"A" and "B" in the WHERE clause are mutually interchangeable.

Kind regards

Paul D.

andreas_lds · Posted 07-01-2019 01:50 AM

If the variable "test" has only the values A and B,you could try the following untested code:

proc sort data=have out=sorted nodupkey;
by id test;
run;

data want;
set sorted;
by id;
if first.id and not last.id;
keep id;
run;

Written on my way to work.

ghosh · Posted 07-01-2019 11:15 AM

data have;
input id test $;
cards;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
;

data new;
  merge have (in=a where = (test="A")) have (in=b where = (test="B"));
  by id;
  if a and b;
run;



PROC SQL;
CREATE TABLE want AS
SELECT DISTINCT id
FROM new
;
QUIT;

hashman · Posted 07-01-2019 11:41 AM

@ghosh :

If you end up using SQL anyway, why precede it with the extra merge step with the assumption that the input file is sorted by ID when a single SQL step can produce the result without making the assumption, e.g.:

data have ;                              
  input id test:$1. ;                    
  cards ;                                
1  A                                     
1  B                                     
2  B                                     
3  A                                     
3  B                                     
3  B                                     
4  A                                     
4  B                                     
5  A                                     
run ;                                    
                                         
proc sql ;                               
  create table want as select unique A.id
  from   have (where = (test = "A")) A   
       , have (where = (test = "B")) B   
  where  A.id = B.id                     
  ;                                      
quit ;

Of course, this is merely a join alternative to the subquery I've offered earlier. On the other hand, if the input file is sorted by ID, SQL isn't needed, as a single DATA step with BY will suffice and be more efficient to boot.

Kind regards

Paul D.

Ksharp · Posted 07-02-2019 08:32 AM

data have ;                    
  input id test:$1. ;          
  cards ;                      
1  A                           
1  B                           
2  B                           
3  A                           
3  B                           
3  B                           
4  A                           
4  B                           
5  A         
; 
run ;     

proc sql;
select distinct id
 from have
  group by id
   having max(test='A') and max(test='B');
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away