I have a question about how to identify id values that meet certain criteria.
Below is a sample of the data:
id test
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
I am trying to figure out the code that would allow me to identify the ids that have both A and B as test values.
So in this example, I want to identify the following ids:
id
1
3
4
Any advice?
Thank you!
This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead
data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;
proc sort data=start noequals;
by id;
run;
data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then
call missing(test_a, test_b);
select(test);
when('A') test_a = 'Y';
when('B') test_b = 'Y';
otherwise;
end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;
This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead
data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;
proc sort data=start noequals;
by id;
run;
data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then
call missing(test_a, test_b);
select(test);
when('A') test_a = 'Y';
when('B') test_b = 'Y';
otherwise;
end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;
Assuming that your input file is sorted by ID:
data have ;
input id test:$1. ;
cards ;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
run ;
data want (keep = id) ;
do until (last.id) ;
set have ;
by id ;
if test = "A" then _A = 1 ;
if test = "B" then _B = 1 ;
end ;
if _A and _B ;
run ;
Or if your input file isn't sorted and/or you prefer SQL:
proc sql ;
create table want as select unique id from have
where test = "A" and id in (select id from have where test = "B")
;
quit ;
"A" and "B" in the WHERE clause are mutually interchangeable.
Kind regards
Paul D.
data have;
input id test $;
cards;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
;
data new;
merge have (in=a where = (test="A")) have (in=b where = (test="B"));
by id;
if a and b;
run;
PROC SQL;
CREATE TABLE want AS
SELECT DISTINCT id
FROM new
;
QUIT;
@ghosh :
If you end up using SQL anyway, why precede it with the extra merge step with the assumption that the input file is sorted by ID when a single SQL step can produce the result without making the assumption, e.g.:
data have ;
input id test:$1. ;
cards ;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
run ;
proc sql ;
create table want as select unique A.id
from have (where = (test = "A")) A
, have (where = (test = "B")) B
where A.id = B.id
;
quit ;
Of course, this is merely a join alternative to the subquery I've offered earlier. On the other hand, if the input file is sorted by ID, SQL isn't needed, as a single DATA step with BY will suffice and be more efficient to boot.
Kind regards
Paul D.
data have ;
input id test:$1. ;
cards ;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
;
run ;
proc sql;
select distinct id
from have
group by id
having max(test='A') and max(test='B');
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.