DATA Step, Macro, Functions and more

Order of tables in Join Condition in Proc SQL

Reply
Contributor
Posts: 30

Order of tables in Join Condition in Proc SQL

Does order of table in join condition matter in terms of performance? 

 

ex:- let's say table a has 10,000 records and table b has 1 MM records then which solution works better 

 

solution 1

 

proc sql;

select * from a  b ;

quit;

 

soluton 2 

 

proc sql;

select * from b a;

quit;

 

Thanks,

nkm

Respected Advisor
Posts: 4,654

Re: Order of tables in Join Condition in Proc SQL

With only two tables involved, order makes no difference. SAS/SQL utilises a query optimizer that takes the sizes of the tables into account, among other factors.
PG
Contributor
Posts: 30

Re: Order of tables in Join Condition in Proc SQL

But if this is nested loop join (no equi join condition so assume sql will use loop join algorithim) then wouldn't it be nice to have small table in right side and big table to left side.  By below process if have small table in Right will minimize less loop and less disk read. Is my understanding correct?

 

Nested Loop Join – When an equality condition is not specified, a read of the complete contents of the right table is processed for each row in the left table.

 

For these processes, a page of data is first read from the left table and then as much data as can fit in memory is
read from the right table. All merges are made (between any observation in the page from LeftT and all observations
in memory that came from RightT). The results are then output. Then the observations from RightT are flushed from
memory and a new read of RightT pulls in as many observations as can fit in memory. All possible matches are made
(between any observation in the page from LeftT and all observations in memory that came from RightT) and results
are output. This process continues until SQL has looped through all of the table RightT.

 

Then SQL takes a step in the left table and reads a new page of data from LeftT into memory. The process of
looping through right table is repeated for the second page from the left table. Then a third page is read from the left
table and the looping through the right hand table is performed again. The process continues until all the
observations in the left table have been read and matched against every observation in the right hand table.

 

Super User
Posts: 10,516

Re: Order of tables in Join Condition in Proc SQL


nkm123 wrote:

But if this is nested loop join (no equi join condition so assume sql will use loop join algorithim) then wouldn't it be nice to have small table in right side and big table to left side.  


 

Since looping and such get involved then it is a good idea to reduce the size of as many tables as possible as early as possible.

You could find your code repeatedly selecting the same records in a loop which can lead to serious performance issues.

 

 

Ask a Question
Discussion stats
  • 3 replies
  • 263 views
  • 1 like
  • 3 in conversation