Hi all,
I have a (small) dataset with a datavariable x, of type string. I want to use its values to select data from another (big) set. Doing an inner join as follows seems to be too time-consuming.
proc sql;
create table aa as
select
a.*
from big a
inner join small b
on a.x = b.x
;
quit;
Is there a better way? For example, first put the x values from the small set into a macro list, and then subset on the big one using a where statement.
Thanks in advance.
Dimitri
Instead of the inner join, you can use a WHERE clause in SQL:
proc sql;
create table aa as
select *
from big
where x in (select distinct x from small)
;
quit;
Use a hash object. No sorting required (the hash sorts internally), will be fastest.
data aa;
set big;
if _n_ = 1
then do;
declare hash b (dataset:"small (keep=x)");
b.definekey('x');
b.definedone();
end;
if b.check() = 0;
run;
If I am understanding you properly, you want to use a RIGHT JOIN instead of an INNER JOIN
Instead of the inner join, you can use a WHERE clause in SQL:
proc sql;
create table aa as
select *
from big
where x in (select distinct x from small)
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.