BookmarkSubscribeRSS Feed
Fluorite | Level 6
Hi there,

i have two tables. table1 has 10 observations with a key field ID.
table2 has one million observations with a key field ID and with other detail information.

So, now I need to get for the table1 ids some addition detail information from table2.
And the key thing is the table2 serves as a master table where you will have all the ids with detail information.

So, left join and inner join both are giving me the same result. But my question is,
which join is the right join and which is more efficient.
there is a way of doing in Hashing also but which is more efficient?

Help will be appreciated.
Fluorite | Level 6 DF
Fluorite | Level 6
I doubt either join is more efficient. The key difference is in what you need your results to show.

For example if you're doing something like:

left join table2 on table1.key = table2.key

The results will be identical to inner join if there is a relevant row for every table1.key in table2.

Inner join shows only where there is a key match in both tables. Left join shows everything from table 1, and will show blanks in the table2 fields if there is no matching row.

SAS has quite a few different options for this. You could use the data step Merge function, or as you say you could use hash tables to do the lookups. Which is the best is a very subjective question to answer - certainly I think you'd need to provide some more detail about the relationship between the two tables before we could answer that.

Hope that's of some help anyway.

Fluorite | Level 6
Hi David,

Thanks for your time.
Well,Let me give some more information on the tables.
ID FirstName LastName
a Million rows.

Proc sql norpint;
create table way1 as
select t1.*,
from table1 as t1 left join table2 as t2
on; quit;

Proc sql norpint;
create table way2 as
select t1.*,
from table1 as t1 inner join table2 as t2
on; quit;

Here the Table2 is a Master Dataset which will have all the ids with additional detail information.
For every id in Table1, there will be a row in Table2 with the same Id.
So, now my question is which is the best way to go!

Rhodochrosite | Level 12
calling table1 requests and table2 master helps a little
Left join might perform poorer depending on the SQL engine, but it guarantees a row on output if you "request" something not in "master".
Super User
I wonder if a where statement in a data step or Proc SQL would be more efficient if you only have 10 observations you're looking for. If you're not sure that the ids would be in table2 then a left join is the safest thing to do.

I have no idea though and did a quick google search and didn't come up with anything.
Quartz | Level 8
Have you tried using an index to combine the tables?

proc sql;
create unique index ID
on table2 (ID);
data table1;
set table1;
set table2(keep=ID FirstName LastName) key=ID;
if _iorc_ ne 0 then do;
call missing(FirstName, LastName);

A hash object might not be a viable alternative given that the master table contains a million records which would need to be loaded into memory. Message was edited by: polingjw
Tourmaline | Level 20
Indexes will work when joining tables with SQL as well.
Always use inner joins when possible (easier for the SQL planner to optimize).
SAS SQL planner does also use hashing, but only implicit - you can't code it...

Data never sleeps
Calcite | Level 5
1) look at the data and understand the cardinality of ID or Join Key in your smaller table (t1) vs the bigger table (t2) ( cardinality meaning -the no of distinct values).
2) Looking at your request the master table(t1) is unique by id. So if that's the case you are matching against one million unique observation.
3) Without having an index and process 1M records against your 10 records will perform definitely poor.
4) In any case creating any index for your table(t2) should perform better.
5) Remember one of the thumbrule to use the index is requesting <15% of your total records. which is true in your case.
6) Go for left join only if you think there is a necessary to go( I mean if your data is not the subset of the master table (t2))

Hope this helps. Message was edited by: jonam

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 7 in conversation