Does order of table in join condition matter in terms of performance?
ex:- let's say table a has 10,000 records and table b has 1 MM records then which solution works better
solution 1
proc sql;
select * from a b ;
quit;
soluton 2
proc sql;
select * from b a;
quit;
Thanks,
nkm
But if this is nested loop join (no equi join condition so assume sql will use loop join algorithim) then wouldn't it be nice to have small table in right side and big table to left side. By below process if have small table in Right will minimize less loop and less disk read. Is my understanding correct?
Nested Loop Join – When an equality condition is not specified, a read of the complete contents of the right table is processed for each row in the left table.
For these processes, a page of data is first read from the left table and then as much data as can fit in memory is
read from the right table. All merges are made (between any observation in the page from LeftT and all observations
in memory that came from RightT). The results are then output. Then the observations from RightT are flushed from
memory and a new read of RightT pulls in as many observations as can fit in memory. All possible matches are made
(between any observation in the page from LeftT and all observations in memory that came from RightT) and results
are output. This process continues until SQL has looped through all of the table RightT.
Then SQL takes a step in the left table and reads a new page of data from LeftT into memory. The process of
looping through right table is repeated for the second page from the left table. Then a third page is read from the left
table and the looping through the right hand table is performed again. The process continues until all the
observations in the left table have been read and matched against every observation in the right hand table.
@nkm123 wrote:But if this is nested loop join (no equi join condition so assume sql will use loop join algorithim) then wouldn't it be nice to have small table in right side and big table to left side.
Since looping and such get involved then it is a good idea to reduce the size of as many tables as possible as early as possible.
You could find your code repeatedly selecting the same records in a loop which can lead to serious performance issues.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.