Hello,
I am trying to write a statement to include only certain variables. Take the sample data below:
ID Index
1 A
1 B
1 B
2 A
3 A
3 B
So in this instance, I am only trying to include the IDs when they only have an A Index, and no B Index. So using this sample data I would only want ID #2 because that is the only ID that doesn't also have a B Index.
Does this make sense? How can I code for this?
Thank you.
proc sql;
create table temp select distinct ID from have
where index='B;
create table want as select * from have
where ID not in (select ID from temp);
quit;
This should be easy to accomplish with SQL. If your dataset is huge, there might be more efficient ways to do it.
data have;
infile datalines;
input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;
proc sql;
create table want as
select id
from have
where index='A'
and id not in (SELECT id from have where Index = 'B')
order by id;
quit;
ID 2
If the data are sorted by ID, then:
data have;
infile datalines;
input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
run;
data want (drop=_:);
set have (where=(index^='A'))
have (where=(index='A'));
by id;
if first.id=1 and index='A' then _keep+1;
else if first.id then _keep=0;
if _keep;
run;
Proc SQL with autoremerge provides this simple solution:
proc sql;
create table want as
select *
from have
group by ID
having sum(Index ne "A") = 0;
quit;
The fastest method would be a data step, if your data is already sorted:
data want;
merge
have
have (
in=delete
keep=id index
rename=(index=_i)
where=(_i = 'B')
)
;
by id;
if not delete;
drop _i;
run;
If it's not sorted, a hash table can do it:
data want;
set have;
if _N_ = 1
then do;
declare hash h (dataset:"have (where=(index = 'B'))");
h.definekey("id");
h.definedone();
end;
if h.check() ne 0;
run;
The hash declaration will automatically discard duplicate entries for ID, so the in-memory table should be fairly small.
Both methods should outperform SQL if the table is quite large.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.