Hello,
I am trying to write a statement to include only certain variables. Take the sample data below:
ID Index
1 A
1 B
1 B
2 A
3 A
3 B
So in this instance, I am only trying to include the IDs when they only have an A Index, and no B Index. So using this sample data I would only want ID #2 because that is the only ID that doesn't also have a B Index.
Does this make sense? How can I code for this?
Thank you.
proc sql;
create table temp select distinct ID from have
where index='B;
create table want as select * from have
where ID not in (select ID from temp);
quit;
This should be easy to accomplish with SQL. If your dataset is huge, there might be more efficient ways to do it.
data have;
infile datalines;
input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
;
run;
proc sql;
create table want as
select id
from have
where index='A'
and id not in (SELECT id from have where Index = 'B')
order by id;
quit;
ID 2
If the data are sorted by ID, then:
data have;
infile datalines;
input ID :8. Index $1.;
datalines;
1 A
1 B
1 B
2 A
3 A
3 B
run;
data want (drop=_:);
set have (where=(index^='A'))
have (where=(index='A'));
by id;
if first.id=1 and index='A' then _keep+1;
else if first.id then _keep=0;
if _keep;
run;
Proc SQL with autoremerge provides this simple solution:
proc sql;
create table want as
select *
from have
group by ID
having sum(Index ne "A") = 0;
quit;
The fastest method would be a data step, if your data is already sorted:
data want;
merge
have
have (
in=delete
keep=id index
rename=(index=_i)
where=(_i = 'B')
)
;
by id;
if not delete;
drop _i;
run;
If it's not sorted, a hash table can do it:
data want;
set have;
if _N_ = 1
then do;
declare hash h (dataset:"have (where=(index = 'B'))");
h.definekey("id");
h.definedone();
end;
if h.check() ne 0;
run;
The hash declaration will automatically discard duplicate entries for ID, so the in-memory table should be fairly small.
Both methods should outperform SQL if the table is quite large.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.