topic Re: Proc sql join performance with Index in SAS Programming

Proc sql join performance with Index

set_all__ — Wed, 12 Apr 2017 19:39:47 GMT

I have two datasets - Big (100 million rows) and small (200,000 rows). The Big dataset is indexed by a unique key.

I have two queries in sequence doing a left join from the small to the big dataset which looks like this:

proc sql;
select *
from a left join b
on a.key=b.key;
quit;

The first time the query runs for 20-30 minutes, but the second time takes only 2-3 minutes. Why does this happen? Is it possible to do something to reduce the runtime for the first query as well?

Re: Proc sql join performance with Index

Bal23 — Wed, 12 Apr 2017 20:29:00 GMT

The big dataset does need that much time to run.

Below is some information might help.

http://analytics.ncsu.edu/sesug/2011/BB08.Lafler.pdf

There are a number of common reasons for slow-running queries and updates:

Slow network communication.
Inadequate memory in the server computer, or not enough memory available .
Lack of useful statistics
Lack of useful indexes.
Lack of useful indexed views.
Lack of useful data striping.
Lack of useful partitioning.

Re: Proc sql join performance with Index

SASKiwi — Wed, 12 Apr 2017 20:33:09 GMT

The most likely reason for the faster second join is that the data has been cached by your server hardware / OS. You could prove this by swapping the queries around, if possible, to see a similar effect.

A faster solution would be to use a SAS format built from the small table to do the data lookup. That way you avoid joining entirely.

Re: Proc sql join performance with Index

jklaverstijn — Wed, 12 Apr 2017 20:35:26 GMT

Your "small" dataset was probably found in the filesystem cache the second time. A normal behaviour that speeds up access to frequently used files by keeping them in memory.

Re: Proc sql join performance with Index

LinusH — Thu, 13 Apr 2017 03:16:51 GMT

Before jumping to any conclusions you need more information.
Add
Options msglevel =i fullstimer;
This will give more information about basic processing and systems constraints.

Proc SQL _method;
Will give you the SQL plan.