BookmarkSubscribeRSS Feed
aravind
Fluorite | Level 6
Hi there,

i have two tables. table1 has 10 observations with a key field ID.
table2 has one million observations with a key field ID and with other detail information.

So, now I need to get for the table1 ids some addition detail information from table2.
And the key thing is the table2 serves as a master table where you will have all the ids with detail information.

So, left join and inner join both are giving me the same result. But my question is,
which join is the right join and which is more efficient.
there is a way of doing in Hashing also but which is more efficient?

Help will be appreciated.
Aravind
7 REPLIES 7
DF
Fluorite | Level 6 DF
Fluorite | Level 6
I doubt either join is more efficient. The key difference is in what you need your results to show.

For example if you're doing something like:

select
*
from
table1
left join table2 on table1.key = table2.key

The results will be identical to inner join if there is a relevant row for every table1.key in table2.

Inner join shows only where there is a key match in both tables. Left join shows everything from table 1, and will show blanks in the table2 fields if there is no matching row.

SAS has quite a few different options for this. You could use the data step Merge function, or as you say you could use hash tables to do the lookups. Which is the best is a very subjective question to answer - certainly I think you'd need to provide some more detail about the relationship between the two tables before we could answer that.

Hope that's of some help anyway.

David.
aravind
Fluorite | Level 6
Hi David,

Thanks for your time.
Well,Let me give some more information on the tables.
Table1
ID
1
2
3
4
.
.
10
Table2
ID FirstName LastName
1
2
3
4
.
.
a Million rows.

Proc sql norpint;
create table way1 as
select t1.*,
t2.firstname,
t2.lastname
from table1 as t1 left join table2 as t2
on t1.id=t2.id; quit;

Proc sql norpint;
create table way2 as
select t1.*,
t2.firstname,
t2.lastname
from table1 as t1 inner join table2 as t2
on t1.id=t2.id; quit;

Here the Table2 is a Master Dataset which will have all the ids with additional detail information.
For every id in Table1, there will be a row in Table2 with the same Id.
So, now my question is which is the best way to go!

Aravind
Peter_C
Rhodochrosite | Level 12
calling table1 requests and table2 master helps a little
Left join might perform poorer depending on the SQL engine, but it guarantees a row on output if you "request" something not in "master".
peterC
Reeza
Super User
I wonder if a where statement in a data step or Proc SQL would be more efficient if you only have 10 observations you're looking for. If you're not sure that the ids would be in table2 then a left join is the safest thing to do.

I have no idea though and did a quick google search and didn't come up with anything.
polingjw
Quartz | Level 8
Have you tried using an index to combine the tables?

[pre]
proc sql;
create unique index ID
on table2 (ID);
quit;
data table1;
set table1;
set table2(keep=ID FirstName LastName) key=ID;
if _iorc_ ne 0 then do;
_error_=0;
call missing(FirstName, LastName);
end;
run;[/pre]

A hash object might not be a viable alternative given that the master table contains a million records which would need to be loaded into memory. Message was edited by: polingjw
LinusH
Tourmaline | Level 20
Indexes will work when joining tables with SQL as well.
Always use inner joins when possible (easier for the SQL planner to optimize).
SAS SQL planner does also use hashing, but only implicit - you can't code it...

/Linus
Data never sleeps
jonam
Calcite | Level 5
Hi,
1) look at the data and understand the cardinality of ID or Join Key in your smaller table (t1) vs the bigger table (t2) ( cardinality meaning -the no of distinct values).
2) Looking at your request the master table(t1) is unique by id. So if that's the case you are matching against one million unique observation.
3) Without having an index and process 1M records against your 10 records will perform definitely poor.
4) In any case creating any index for your table(t2) should perform better.
5) Remember one of the thumbrule to use the index is requesting <15% of your total records. which is true in your case.
6) Go for left join only if you think there is a necessary to go( I mean if your data is not the subset of the master table (t2))

Hope this helps. Message was edited by: jonam

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 17731 views
  • 0 likes
  • 7 in conversation