Help using Base SAS procedures

How to speed up proc sql when joining a large datset to a small dataset?

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

How to speed up proc sql when joining a large datset to a small dataset?

I am running the following sql-step:

 

proc sql;

   create table id_map as select

           a.securityId

           ,b.name

           ,b.companyId

   from eq_securityIds as a

   left join FOUNDATION_SECURITY as b on a.securityId=b.securityId

       ;

quit;

 

where the dataset eq_securityId consists of ~16000 observations (here denoted a small dataset) and the dataset FOUNDATION_SECURITY consist of ~60 000 000 observations (here denoted a large dataset).

As of now the runtime to execute this command is about 10min, is there anything I can do to speed up the sql-step? e.g, is it in general faster to join a small dataset on to a large dataset instead of me joining the large dataset onto the small dataset? could some kind of indexation of the large dataset help me speeding up the computations?

I noted that the command ran a lot faster (~3min) when I excluded the variable b.name in the same sql-step, is this a general result or was it just a fluke?

 

 I run SAS 9.4 (in enterprise guide 7.1)

 


Accepted Solutions
Solution
‎02-27-2018 08:37 AM
Super User
Posts: 5,888

Re: How to speed up proc sql when joining a large datset to a small dataset?

Posted in reply to metuzalem

Indexing the large table is a good idea, especially if it's used in other use cases.

This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).

Data never sleeps

View solution in original post


All Replies
Super User
Posts: 10,283

Re: How to speed up proc sql when joining a large datset to a small dataset?

Posted in reply to metuzalem

Keep both datasets sorted by securityid, and use a data step merge for the selection of your subset.

 

Alternatively, create a format from eq_securityIds where all contained securityid keys get a label of 'yes' and the OTHER value a 'no'. Then you can use the format in a subsetting if in a data step.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Solution
‎02-27-2018 08:37 AM
Super User
Posts: 5,888

Re: How to speed up proc sql when joining a large datset to a small dataset?

Posted in reply to metuzalem

Indexing the large table is a good idea, especially if it's used in other use cases.

This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).

Data never sleeps
New Contributor
Posts: 2

Re: How to speed up proc sql when joining a large datset to a small dataset?

Thx!

by running the command

proc sql;

    create index securityId on FOUNDATION_SECURITY;

quit;

the sql-step is executed a lot faster. It takes about 2min to create the index but since I am running similar join-queries further down the in the program with the large dataset then this helped a lot. Thx!

 

 

 

Super User
Posts: 10,787

Re: How to speed up proc sql when joining a large datset to a small dataset?

Posted in reply to metuzalem

1) As @LinusH said , make a index on the large table is a good choice.

 

create index securityid on BigTable;

 

2)Hash Table 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 310 views
  • 0 likes
  • 4 in conversation