%put Hi _ALL_;
I was wondering about OUTER JOINS;
It is written in some books for eg.: like Kirk Paul Lafler's book 'PROC SQL: Beyound the basics...' that OUTER JOINS can be used with... and it is a quote:
Outer Joins
A maximum of two tables are referenced in a FROM and ON clause of a SELECT statement.
I was wondering why and when it could occur becouse we can easily use multiple left join or right join or full join in a query;
Could You tell, show me an example and info from the sas log with such a 'wrong' example.
I would be thankfull;
Perhaps you are overthinking this statement.
It's just about syntax. And like you say, you can have multiple outer joins in the same query (but not in the same ON clause).
So not sure what else need to be said.
@LinusH wrote:Perhaps you are overthinking this statement.
It's just about syntax. And like you say, you can have multiple outer joins in the same query (but not in the same ON clause).
So not sure what else need to be said.
Thank You for quick response.
Maybe You are right but then is no difference between INNER JOIN and OUTER JOIN syntax and it is also written (in the same book) that:
INNER JOINS:
A join that only retrieves rows with matching values from two or more tables (maximum of 256 tables). This type of join is referred to as a conventional type of join.
OUTER JOINS
Ajoin that retrieves rows with matching values while preserving some or all of the unmatched rows from one or both tables (maximum of 2 tables).
If You can show me an example of using an "ON" clause with two or more tables.
So why Maxiumum of 2 tables? It is same syntax like inner join.
Looks to me like bad wording in the documentation.
Report to Tech support so they can request that it is fixed (you might have to insist a bit and defend your point to get it accepted).
[Edit: One of the book is not an official SAS book, but it is edited by SAS Publications, so I reckon they should stand by both books you mention in your recent reply.]
The correct word for SAS SQL syntax is FULL JOIN.
Maxim 4: Try It.
data a;
input id $;
datalines;
A
B
C
;
data b;
input id $;
datalines;
D
E
F
;
data c;
input id $;
datalines;
G
H
I
;
proc sql;
create table want as
select coalesce(a.id,b.id,c.id) as id
from a
full join b
on a.id = b.id
full join c
on a.id = c.id
order by calculated id
;
quit;
You can always have multiple joins in a single query, but only two tables in a single join.
@Kurt_Bremser wrote:The correct word for SAS SQL syntax is FULL JOIN.
Maxim 4: Try It.
data a; input id $; datalines; A B C ; data b; input id $; datalines; D E F ; data c; input id $; datalines; G H I ; proc sql; create table want as select coalesce(a.id,b.id,c.id) as id from a full join b on a.id = b.id full join c on a.id = c.id order by calculated id ; quit;
You can always have multiple joins in a single query, but only two tables in a single join.
Tnak You for quick response !
I see that it is a good example of correct syntax.
But I want a wrong syntax to prove that OUTER JOIN is restricted to max two tables.
In SQL a statement is from a "select"(becouse select is a statement) to "quit" is a example of query (and it contains clauses like "ON" "WHERE" "FROM" ETC) and we know that we can nest queries with other queries. "ON" is a clause and it is a part of a OUTER JOIN and also of an INNER JOIN so why it is written that max 2 tables is resticted when INNER JOIN and OUTER JOIN is the same syntax.
Can You show me a example of a WRONG example that uses more then two tables in OUTER QUERY to show how I can make the sytax wrong with the single join.
Its like.
SELECT *
FROM a FULL JOIN b
on a.id=b.id and C.id=a.id;
QUIT;
the error is I put C table in On clause and I didnt JOIN it.
It is no difference with INNER JOIN and I still dont understand why INNER JOIN is for 256 tables and OUTER JOIN is for max 2.
Hi @ToTylkoGra,
To review, INNER joins produce a result that includes all matching records from multiple tables based on one or more conditions. That's why it makes sense that we could pull these from as many tables as we want, as we'll get values only from the records that match.
OUTER joins produce a result that pulls ALL records from one or two tables, matching records when possible and adding null values when there isn't a match. FULL outer join pulls all records from two tables, LEFT outer join pulls all records from the first table in the FROM and matching records from the second table, where RIGHT outer join pulls matching records from the first table and all records from the second. This is usually represented by a Venn diagram to show the result, always with just two circles (one for each table).
See Selecting Data from More Than One Table By Using Joins in the SAS documentation for a better, more complete explanation of what is possible. Ultimately you can join records from as many tables as you want, but when combining OUTER join types it's best to think in terms of result sets, where the results from one join are then joined with another table or result set (so, usually joining these in pairs). Ordering your joins and specifying the correct join conditions can be somewhat of an artform, especially when you are trying to optimize for the best performance.
In case you're wondering, this is one of the big differences between DATA step and PROC SQL. In DATA step you tell SAS exactly how to combine tables to get what you want. In SQL you tell SAS what you want, and SAS will optimize the SQL to make it happen. See this video tutorial for more information on SQL vs DATA step.
@ChrisHemedinger I don't understand how you are pitting INNER against OUTER joins in terms of table or data handling.
INNER joins are also represented on a 2-circle Venn diagram, only merge 2 tables per join, and otherwise behave identically in terms of data-matching to OUTER joins.
There is no difference in my eyes.
In case you are thinking about the SQL syntax that uses a comma-separated list of table names followed by a WHERE clause -instead of using the word JOIN and a ON clause- is not specific to INNER joins and is used to create all types of joins. So I am confused by your reply.
In all cases, the logic is: join two tables by matching the rows according to the keys and by dropping the rows as requested, and optionally match more tables following a similar logic.
@ChrisNZ Maybe I'm trying to simplify too much for the sake of the OP here, and yes, I was thinking of the comma syntax in PROC SQL that -- in my experience -- is usually used for the intent of INNER join (focusing only on what's common among all named tables).
Thank You @ChrisNZ for this point of view.
I wrote more information about the problem under the @ChrisHemedinger post.
Thank You guys for responses.
and
@ChrisHemedingerour Community Manager.
It is good to hear/see that pandemic did not kill everyone.
Thank You @ChrisHemedinger for the link to SAS Documentation. It is always the best foundation.
I have not found there anything sugesting that outer joins are restricted to max two tables.
Venn diagram is a vizualization of kind of joins as You mantioned in your post.
And also thanks for the wideo about merge vs sql join is great review and I recommend it to everyone. (I could hardly focus on the content but finnaly I made it )
I think @DominiqueWeatherspoon should mention about that SQL JOINS also have a ‘control processing’ with some proc sql options like magic=101, magic=102 magic=103.
Lets get back to the main topic.
I will start from the beggining. If someone is not familiar what is going on please read all of the earlier information posted.
My question is based on few excerpts from two books published by SAS. The first book is:
PROC SQL, 3rd Edition
by Kirk Paul Lafler
Published by SAS Institute, 2019
And the second book is:
SAS® Certification Prep Guide
Advanced Programming for SAS®9 Fourth Edition
In this topic we will try only talk about HORIZONTAL JOINS. OUTER JOIN(left, right, full) is a part of the horizontal join and INNER JOIN is also a horizontal join.
In the first book author mentioned two times about OUTER JOINS like
And the Certification Book:
Creating an Inner Join with Outer Join-Style Syntax.
„OUTER JOIN why it is restricted to max 2 tables” is the main question.
The question says that OUTER JOINS are restricet to 2 tables but realy are they ?
Why it is written in thouse sources that there is a restriction?
Basing on the information as I mentioned earlier from the two books we can make two questions/hypothesis.
H0 - OUTER JOINS are restricted to two tables
H1 - OUTER JOINS are not restricted two tables
Lets consider folowing example:
The macro is as simple as it can be. It is creating a join with a type of your choice and the number of tables. If someone do not understand the macro I will try to explain it in priv msg if someone wants or here.
options mprint mcompilenote=all;
%macro innerouter(join=inner, numberoftables=255);
proc sql;
select a.z1
from a
%do i=1 %to &numberoftables;
&join join a
as a&i
on a.z1=a&i..z1
%end;
;
quit;
%mend;
Invocation of the macro and not producting error in the log:
%innerouter(join=left, numberoftables=255)
%innerouter(join=right, numberoftables=255)
%innerouter(join=full, numberoftables=255)
%innerouter(join=inner, numberoftables=255)
Invocation of the macro and producing error in the log:
%innerouter(join=left, numberoftables=256)
%innerouter(join=right, numberoftables=256)
%innerouter(join=full, numberoftables=256)
%innerouter(join=inner, numberoftables=256)
LOG:
ERROR: A maximum of 256 tables can be processed in a single PROC SQL statement.
ERROR: Column z1 could not be found in the table/view identified with the correlation name A256.
ERROR: Column z1 could not be found in the table/view identified with the correlation name A256.
And the Note form the CERT BOOK ALSO:
Following this logic it proves that every kind of horizontal join is processed in a single join and it is capable to proces a maxiumum of 256 table at a time. To be completly accurate to the information from the macro - numberoftables=255 is 255 tables + first table = 256 (the rest tables are self joined iterations).
How to understand those informations from those books related to this point of view ?
What author was thinking mentioning about maxiumum of two tables at a time?
Is it a posible misleading text in those books?
If I wrote something that is a crack in my logic or something that is not true just feel free and write want is wrong and what You think.
%put Greetings _ALL_;
Any time you use the JOIN keyword, you can only combine two tables with it; further tables need additional JOINs.
The (usually inner) join method that works with a comma-separated list of tables in the FROM clause and a WHERE condition accepts any arbitrary number of tables.
> comma-separated list of tables in the FROM clause and a WHERE condition accepts any arbitrary number of tables.
That is not my experience. 256 tables is the maximum regardless of the syntax used.
@ChrisNZ wrote:
> comma-separated list of tables in the FROM clause and a WHERE condition accepts any arbitrary number of tables.
That is not my experience. 256 tables is the maximum regardless of the syntax used.
Then let's say "arbitrary within the overall limits of the SQL procedure".
The comma separated list of tables are more like full joins. Or as the log notes call it "Cartesian product joins".
You can filter the results with a WHERE clause to make the results look like an inner join.
385 data age(keep=age) sex(keep=sex); 386 do age=1 to 10 ; output age; end; 387 do sex='M','F'; output sex; end; 388 run; NOTE: The data set WORK.AGE has 10 observations and 1 variables. NOTE: The data set WORK.SEX has 2 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.03 seconds 389 390 proc sql; 391 create table test as 392 select * 393 from age,sex 394 ; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.TEST created, with 20 rows and 2 columns. 395 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.12 seconds cpu time 0.03 seconds
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.