topic Re: How to speed up proc sql when joining a large datset to a small dataset? in SAS Procedures

How to speed up proc sql when joining a large datset to a small dataset?

metuzalem — Tue, 27 Feb 2018 07:51:02 GMT

I am running the following sql-step:

proc sql;

create table id_map as select

a.securityId

,b.name

,b.companyId

from eq_securityIds as a

left join FOUNDATION_SECURITY as b on a.securityId=b.securityId

;

quit;

where the dataset eq_securityId consists of ~16000 observations (here denoted a small dataset) and the dataset FOUNDATION_SECURITY consist of ~60 000 000 observations (here denoted a large dataset).

As of now the runtime to execute this command is about 10min, is there anything I can do to speed up the sql-step? e.g, is it in general faster to join a small dataset on to a large dataset instead of me joining the large dataset onto the small dataset? could some kind of indexation of the large dataset help me speeding up the computations?

I noted that the command ran a lot faster (~3min) when I excluded the variable b.name in the same sql-step, is this a general result or was it just a fluke?

I run SAS 9.4 (in enterprise guide 7.1)

Re: How to speed up proc sql when joining a large datset to a small dataset?

Kurt_Bremser — Tue, 27 Feb 2018 08:02:42 GMT

Keep both datasets sorted by securityid, and use a data step merge for the selection of your subset.

Alternatively, create a format from eq_securityIds where all contained securityid keys get a label of 'yes' and the OTHER value a 'no'. Then you can use the format in a subsetting if in a data step.

Re: How to speed up proc sql when joining a large datset to a small dataset?

LinusH — Tue, 27 Feb 2018 10:00:43 GMT

Indexing the large table is a good idea, especially if it's used in other use cases.

This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).

Re: How to speed up proc sql when joining a large datset to a small dataset?

Ksharp — Tue, 27 Feb 2018 12:31:42 GMT

1) As @LinusH said , make a index on the large table is a good choice.

create index securityid on BigTable;

2)Hash Table

Re: How to speed up proc sql when joining a large datset to a small dataset?

metuzalem — Tue, 27 Feb 2018 13:46:14 GMT

Thx!

by running the command

proc sql;

create index securityId on FOUNDATION_SECURITY;

quit;

the sql-step is executed a lot faster. It takes about 2min to create the index but since I am running similar join-queries further down the in the program with the large dataset then this helped a lot. Thx!