I have a question about how to identify id values that meet certain criteria.
Below is a sample of the data:
id test
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
I am trying to figure out the code that would allow me to identify the ids that have both A and B as test values.
So in this example, I want to identify the following ids:
id
1
3
4
Any advice?
Thank you!
This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead
data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;
proc sort data=start noequals;
by id;
run;
data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then
call missing(test_a, test_b);
select(test);
when('A') test_a = 'Y';
when('B') test_b = 'Y';
otherwise;
end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;
This may look like over-egging the pudding, but it allows for any variations in the population of your source data. #ThinkingAhead
data start;
infile cards dsd dlm=',' firstobs=2;
attrib id length=3;
attrib test length=$ 1;
input id
test;
cards;
id,test
1,A
1,B
2,B
3,A
3,B
3,B
4,A
4,B
5,A
;
run;
proc sort data=start noequals;
by id;
run;
data target;
set start;
by id;
attrib test_a test_b length=$ 1;
retain test_a test_b;
if first.id then
call missing(test_a, test_b);
select(test);
when('A') test_a = 'Y';
when('B') test_b = 'Y';
otherwise;
end;
if last.id;
if test_a ne ' ' and test_b ne ' ';
keep id;
run;
Assuming that your input file is sorted by ID:
data have ;
input id test:$1. ;
cards ;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
run ;
data want (keep = id) ;
do until (last.id) ;
set have ;
by id ;
if test = "A" then _A = 1 ;
if test = "B" then _B = 1 ;
end ;
if _A and _B ;
run ;
Or if your input file isn't sorted and/or you prefer SQL:
proc sql ;
create table want as select unique id from have
where test = "A" and id in (select id from have where test = "B")
;
quit ;
"A" and "B" in the WHERE clause are mutually interchangeable.
Kind regards
Paul D.
data have;
input id test $;
cards;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
;
data new;
merge have (in=a where = (test="A")) have (in=b where = (test="B"));
by id;
if a and b;
run;
PROC SQL;
CREATE TABLE want AS
SELECT DISTINCT id
FROM new
;
QUIT;
@ghosh :
If you end up using SQL anyway, why precede it with the extra merge step with the assumption that the input file is sorted by ID when a single SQL step can produce the result without making the assumption, e.g.:
data have ;
input id test:$1. ;
cards ;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
run ;
proc sql ;
create table want as select unique A.id
from have (where = (test = "A")) A
, have (where = (test = "B")) B
where A.id = B.id
;
quit ;
Of course, this is merely a join alternative to the subquery I've offered earlier. On the other hand, if the input file is sorted by ID, SQL isn't needed, as a single DATA step with BY will suffice and be more efficient to boot.
Kind regards
Paul D.
data have ;
input id test:$1. ;
cards ;
1 A
1 B
2 B
3 A
3 B
3 B
4 A
4 B
5 A
;
run ;
proc sql;
select distinct id
from have
group by id
having max(test='A') and max(test='B');
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.