Here's a sample of what I have:
var1 | var2 |
AA | Paul |
BB | Paul |
XX | Paul |
dd | Mike |
pp | Mike |
cc | Dave |
ee | Dave |
ff | Dave |
gg | Dave |
XX | Dave |
hh | Ken |
ii | Ken |
jj | Ken |
XX | Ken |
kk | Rick |
ll | Rick |
mm | Rick |
Here's a sample of what I want:
var1 | var2 |
AA | Paul |
BB | Paul |
XX | Paul |
cc | Dave |
ee | Dave |
ff | Dave |
gg | Dave |
XX | Dave |
hh | Ken |
ii | Ken |
jj | Ken |
XX | Ken |
Or a data step hash table lookup
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(var1=:"XX"))');
h1.defineKey('var2');
h1.defineDone();
end;
set have;
if h1.check()=0;
run;
Any suggestion would be welcome.
Please post your sample data as code (DATA step with CARDS statement). Also please post the code you have tried. This will help others help you.
Trying to guess what you mean it sounds like you want all observations for any person that has VAR1='XX'. (Note question are easier to read if the details are in the body of the question and not just the subject line.)
SQL should make that simple.
proc sql;
create table want as
select * from have
where var2 in
(select var2 from have where var1='XX')
;
quit;
Double DO method:
data want;
do until (last.var2);
set have;
by var2 notsorted;
if var1 = "XX" then flag = 1;
end;
do until (last.var2);
set have;
by var2 notsorted;
if flag then output;
end;
drop flag;
run;
Or a data step hash table lookup
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(var1=:"XX"))');
h1.defineKey('var2');
h1.defineDone();
end;
set have;
if h1.check()=0;
run;
Most efficient of all the solutions offered.
If the dataset is not particularly big, then I think your best option is the hash object technique suggested by @Patrick. The code is simple.
That solution requires reading the data in two unsynchronized data streams, which could create a performance hit in the case of large datasets, due to disk activity.
Now if the data were sorted by VAR2, you could do:
data want;
merge have (where=(var1='XX') in=wanted)
have;
by var2;
if wanted;
run;
which synchronizes the data streams and reduces disk activity.
Your data are not sorted by VAR2, although it is grouped by VAR2. So you can generate data stream synchronization with code such as this:
data want (drop=_:);
set have (where=(var1='XX') rename=(var2=_var2));
do until (last.var2=1 and var2=_var2);
set have;
by var2 notsorted;
if var2=_var2 then output;
end;
run;
The above assumes no more than one var1='XX' for any given VAR2. If you can have two or more var1='XX' cases, then:
data want (drop=_:);
set have (where=(var1='XX') rename=(var2=_var2));
by _var2 notsorted;
if last._var2 then do until (last.var2=1 and var2=_var2);
set have;
by var2 notsorted;
if var2=_var2 then output;
end;
run;
And yes, the degree of synchronization achieved would depend on the distribution of the "XX" cases.
Just to have (more) complete picture.
The power of IN operator over a macro variable with constant list of values:
data have;
input var1 $ var2 $;
cards;
AA Paul
BB Paul
XX Paul
dd Mike
pp Mike
cc Dave
ee Dave
ff Dave
gg Dave
XX Dave
hh Ken
ii Ken
jj Ken
XX Ken
kk Rick
ll Rick
mm Rick
;
run;
proc print data=have;
run;
proc SQL noprint;
select
distinct quote(var2)
into :myList separated by " "
from have
where var1 = "XX"
;
quit;
data want;
set have;
where var2 in (&myList.);
run;
proc print data=want;
run;
BTW. Just for fun, here is an article "Route Sixty-Six to SAS Programming, or 63 (+3) Syntax Snippets for a Table Look-up Task, or How to L..." that provides 66 examples of solving the table-lookup task 🙂
All the best
Bart
Or you would like to get it automatically ? and not type XX by hand ?
data have;
input var1 $ var2 $;
cards;
AA Paul
BB Paul
XX Paul
dd Mike
pp Mike
cc Dave
ee Dave
ff Dave
gg Dave
XX Dave
hh Ken
ii Ken
jj Ken
XX Ken
kk Rick
ll Rick
mm Rick
;
proc sql;
create table want as
select * from have where var2 in
(
select distinct var2 from have group by var1 having count(distinct var2)>1
)
;
quit;
I'm not sure if this is possible to have such data, but what about "Joe" (last record) ?
data have;
input var1 $ var2 $;
cards;
AA Paul
BB Paul
XX Paul
dd Mike
pp Mike
cc Dave
ee Dave
ff Dave
gg Dave
XX Dave
hh Ken
ii Ken
jj Ken
XX Ken
kk Rick
ll Rick
mm Rick
AA Joe
;
Bart
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.