I am running the following sql-step:
proc sql;
create table id_map as select
a.securityId
,b.name
,b.companyId
from eq_securityIds as a
left join FOUNDATION_SECURITY as b on a.securityId=b.securityId
;
quit;
where the dataset eq_securityId consists of ~16000 observations (here denoted a small dataset) and the dataset FOUNDATION_SECURITY consist of ~60 000 000 observations (here denoted a large dataset).
As of now the runtime to execute this command is about 10min, is there anything I can do to speed up the sql-step? e.g, is it in general faster to join a small dataset on to a large dataset instead of me joining the large dataset onto the small dataset? could some kind of indexation of the large dataset help me speeding up the computations?
I noted that the command ran a lot faster (~3min) when I excluded the variable b.name in the same sql-step, is this a general result or was it just a fluke?
I run SAS 9.4 (in enterprise guide 7.1)
Indexing the large table is a good idea, especially if it's used in other use cases.
This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).
Keep both datasets sorted by securityid, and use a data step merge for the selection of your subset.
Alternatively, create a format from eq_securityIds where all contained securityid keys get a label of 'yes' and the OTHER value a 'no'. Then you can use the format in a subsetting if in a data step.
Indexing the large table is a good idea, especially if it's used in other use cases.
This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).
Thx!
by running the command
proc sql;
create index securityId on FOUNDATION_SECURITY;
quit;
the sql-step is executed a lot faster. It takes about 2min to create the index but since I am running similar join-queries further down the in the program with the large dataset then this helped a lot. Thx!
1) As @LinusH said , make a index on the large table is a good choice.
create index securityid on BigTable;
2)Hash Table
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.