DATA Step, Macro, Functions and more

Proc Sql Left Join VS Inner Join Question

Reply
Contributor
Posts: 21

Proc Sql Left Join VS Inner Join Question

Hi there,

i have two tables. table1 has 10 observations with a key field ID.
table2 has one million observations with a key field ID and with other detail information.

So, now I need to get for the table1 ids some addition detail information from table2.
And the key thing is the table2 serves as a master table where you will have all the ids with detail information.

So, left join and inner join both are giving me the same result. But my question is,
which join is the right join and which is more efficient.
there is a way of doing in Hashing also but which is more efficient?

Help will be appreciated.
Aravind
Frequent Contributor
Frequent Contributor
Posts: 94

Re: Proc Sql Left Join VS Inner Join Question

I doubt either join is more efficient. The key difference is in what you need your results to show.

For example if you're doing something like:

select
*
from
table1
left join table2 on table1.key = table2.key

The results will be identical to inner join if there is a relevant row for every table1.key in table2.

Inner join shows only where there is a key match in both tables. Left join shows everything from table 1, and will show blanks in the table2 fields if there is no matching row.

SAS has quite a few different options for this. You could use the data step Merge function, or as you say you could use hash tables to do the lookups. Which is the best is a very subjective question to answer - certainly I think you'd need to provide some more detail about the relationship between the two tables before we could answer that.

Hope that's of some help anyway.

David.
Contributor
Posts: 21

Re: Proc Sql Left Join VS Inner Join Question

Hi David,

Thanks for your time.
Well,Let me give some more information on the tables.
Table1
ID
1
2
3
4
.
.
10
Table2
ID FirstName LastName
1
2
3
4
.
.
a Million rows.

Proc sql norpint;
create table way1 as
select t1.*,
t2.firstname,
t2.lastname
from table1 as t1 left join table2 as t2
on t1.id=t2.id; quit;

Proc sql norpint;
create table way2 as
select t1.*,
t2.firstname,
t2.lastname
from table1 as t1 inner join table2 as t2
on t1.id=t2.id; quit;

Here the Table2 is a Master Dataset which will have all the ids with additional detail information.
For every id in Table1, there will be a row in Table2 with the same Id.
So, now my question is which is the best way to go!

Aravind
Valued Guide
Posts: 2,177

Re: Proc Sql Left Join VS Inner Join Question

calling table1 requests and table2 master helps a little
Left join might perform poorer depending on the SQL engine, but it guarantees a row on output if you "request" something not in "master".
peterC
Super User
Posts: 19,870

Re: Proc Sql Left Join VS Inner Join Question

I wonder if a where statement in a data step or Proc SQL would be more efficient if you only have 10 observations you're looking for. If you're not sure that the ids would be in table2 then a left join is the safest thing to do.

I have no idea though and did a quick google search and didn't come up with anything.
Regular Contributor
Posts: 171

Re: Proc Sql Left Join VS Inner Join Question

Have you tried using an index to combine the tables?

[pre]
proc sql;
create unique index ID
on table2 (ID);
quit;
data table1;
set table1;
set table2(keep=ID FirstName LastName) key=ID;
if _iorc_ ne 0 then do;
_error_=0;
call missing(FirstName, LastName);
end;
run;[/pre]

A hash object might not be a viable alternative given that the master table contains a million records which would need to be loaded into memory. Message was edited by: polingjw
Super User
Posts: 5,438

Re: Proc Sql Left Join VS Inner Join Question

Indexes will work when joining tables with SQL as well.
Always use inner joins when possible (easier for the SQL planner to optimize).
SAS SQL planner does also use hashing, but only implicit - you can't code it...

/Linus
Data never sleeps
Contributor
Posts: 29

Re: Proc Sql Left Join VS Inner Join Question

Hi,
1) look at the data and understand the cardinality of ID or Join Key in your smaller table (t1) vs the bigger table (t2) ( cardinality meaning -the no of distinct values).
2) Looking at your request the master table(t1) is unique by id. So if that's the case you are matching against one million unique observation.
3) Without having an index and process 1M records against your 10 records will perform definitely poor.
4) In any case creating any index for your table(t2) should perform better.
5) Remember one of the thumbrule to use the index is requesting <15% of your total records. which is true in your case.
6) Go for left join only if you think there is a necessary to go( I mean if your data is not the subset of the master table (t2))

Hope this helps. Message was edited by: jonam
Ask a Question
Discussion stats
  • 7 replies
  • 5805 views
  • 0 likes
  • 7 in conversation