I am running the following sql-step:
proc sql;
create table id_map as select
a.securityId
,b.name
,b.companyId
from eq_securityIds as a
left join FOUNDATION_SECURITY as b on a.securityId=b.securityId
;
quit;
where the dataset eq_securityId consists of ~16000 observations (here denoted a small dataset) and the dataset FOUNDATION_SECURITY consist of ~60 000 000 observations (here denoted a large dataset).
As of now the runtime to execute this command is about 10min, is there anything I can do to speed up the sql-step? e.g, is it in general faster to join a small dataset on to a large dataset instead of me joining the large dataset onto the small dataset? could some kind of indexation of the large dataset help me speeding up the computations?
I noted that the command ran a lot faster (~3min) when I excluded the variable b.name in the same sql-step, is this a general result or was it just a fluke?
I run SAS 9.4 (in enterprise guide 7.1)
Indexing the large table is a good idea, especially if it's used in other use cases.
This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).
Keep both datasets sorted by securityid, and use a data step merge for the selection of your subset.
Alternatively, create a format from eq_securityIds where all contained securityid keys get a label of 'yes' and the OTHER value a 'no'. Then you can use the format in a subsetting if in a data step.
Indexing the large table is a good idea, especially if it's used in other use cases.
This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).
Thx!
by running the command
proc sql;
create index securityId on FOUNDATION_SECURITY;
quit;
the sql-step is executed a lot faster. It takes about 2min to create the index but since I am running similar join-queries further down the in the program with the large dataset then this helped a lot. Thx!
1) As @LinusH said , make a index on the large table is a good choice.
create index securityid on BigTable;
2)Hash Table
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.