Hi all,
I have a (small) dataset with a datavariable x, of type string. I want to use its values to select data from another (big) set. Doing an inner join as follows seems to be too time-consuming.
proc sql;
create table aa as
select
a.*
from big a
inner join small b
on a.x = b.x
;
quit;
Is there a better way? For example, first put the x values from the small set into a macro list, and then subset on the big one using a where statement.
Thanks in advance.
Dimitri
Instead of the inner join, you can use a WHERE clause in SQL:
proc sql;
create table aa as
select *
from big
where x in (select distinct x from small)
;
quit;
Use a hash object. No sorting required (the hash sorts internally), will be fastest.
data aa;
set big;
if _n_ = 1
then do;
declare hash b (dataset:"small (keep=x)");
b.definekey('x');
b.definedone();
end;
if b.check() = 0;
run;
If I am understanding you properly, you want to use a RIGHT JOIN instead of an INNER JOIN
Instead of the inner join, you can use a WHERE clause in SQL:
proc sql;
create table aa as
select *
from big
where x in (select distinct x from small)
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.