DATA Step, Macro, Functions and more

Proc sql join performance with Index

Occasional Contributor
Posts: 8

Proc sql join performance with Index

I have two datasets - Big (100 million rows) and small (200,000 rows). The Big dataset is indexed by a unique key.


I have two queries in sequence doing a left join from the small to the big dataset which looks like this:



proc sql;
select *
from a left join b
on a.key=b.key;

The first time the query runs for 20-30 minutes, but the second time takes only 2-3 minutes. Why does this happen? Is it possible to do something to reduce the runtime for the first query as well?

Super Contributor
Posts: 336

Re: Proc sql join performance with Index

The big dataset does need that much time to run.


Below is some information might help.



There are a number of common reasons for slow-running queries and updates:

  • Slow network communication.

  • Inadequate memory in the server computer, or not enough memory available .

  • Lack of useful statistics

  • Lack of useful indexes.

  • Lack of useful indexed views.

  • Lack of useful data striping.

  • Lack of useful partitioning.

Super User
Posts: 3,115

Re: Proc sql join performance with Index

The most likely reason for the faster second join is that the data has been cached by your server hardware / OS. You could prove this by swapping the queries around, if possible, to see a similar effect.


A faster solution would be to use a SAS format built from the small table to do the data lookup. That way you avoid joining entirely. 

Super Contributor
Posts: 408

Re: Proc sql join performance with Index

Your "small" dataset was probably found in the filesystem cache the second time. A normal behaviour that speeds up access to frequently used files by keeping them in memory.

Super User
Posts: 5,260

Re: Proc sql join performance with Index

Before jumping to any conclusions you need more information.
Options msglevel =i fullstimer;
This will give more information about basic processing and systems constraints.

Proc SQL _method;
Will give you the SQL plan.
Data never sleeps
Ask a Question
Discussion stats
  • 4 replies
  • 5 in conversation