BookmarkSubscribeRSS Feed

Diving Deeper: How Many Ways Can You Join SAS® Tables? Q&A, Slides, and On-Demand Recording

Started ‎03-07-2024 by
Modified ‎03-07-2024 by
Views 323

Watch this Ask the Expert session to learn various join algorithms and see comparisons between them. 

 

Watch the Webinar

 

You will learn about:

  • SQL implicit join.
  • SQL explicit join.
  • FEDSQL join in SAS® Viya®.

 

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

 

Q&A

Isn't the natural join the same as an inner join?

The natural join is like an inner join, but there are some differences.

  1. A natural Join joins two tables based on same attribute name and datatypes while an Inner Join joins two table based on the column which is explicitly specified in the ON clause. So, a natural join is going to implicitly join on same named columns.
  2. The other difference is that a Natural join returns only one instance of a common join column, while inner join returns duplicate join columns from both tables.

 

Can you explain when case matters in column names in SQL joins? I have been "caught" by that in the past.

Case should not matter unless you are working with dictionary tables – in that case, column and table names are now data in a table. The Libname(library name) and Memname(table name) columns are stored in all uppercase while Name (column name) is stored in mixed case.

 

Since the UPCASE function is used three times in one SQL statement (GROUP BY, WHERE, etc.), would it be more efficient to create a single alias variable and subsequent reference that...so that UPCASE only has to be called once?

Possibly - UPCASE should not require many resources to process but referring to a function fewer times if possible is always good practice.

 

So, the way to avoid generation of a cartesian product is to use the ON clause?

Yes - but to be more specific, be very careful how the ON clause is written, e.g., be sure that you know the data well enough to match on all necessary columns. Review metadata by checking out dictionary tables and really get to know your data before attempting a join to avoid an unnecessary cartesian product.

 

When you specify column names in the join or in a where statement, within PROC SQL, does case matter?

It does not - outside of the macro language. In SAS, case only matters where there are quoted strings – column names are not case-sensitive.

 

You mean in the in the syntax itself when I specify the names of the columns, the names of the tables? Anything here that you have written here you can do. SAS is case agnostic. One of the few times you need to respect case is anything enclosing quotations, which is a string value, needs to be case sensitive. If you are saying I am looking for everybody from VEGAS, and VEGAS in your data is stored in uppercase and you are putting it in quotations, then that needs to be in uppercase dictionary table. There are some things in dictionary tables, like libname statements, that are stored in uppercase.

 

When specifying columns in the select, why add the drop / keep statements also?

Mostly personal preference & also serves as documentation. The keep=explicitly defines what columns are kept table & can serve as a good visual documentation reminder. With that said, you can certainly use comma separators on SELECT instead of KEEP – it is just as easy and follows standard SQL syntax. DROP, however, can be a nice way to type less (lazy people like me like that) if you remember you are not using standard SQL syntax (dropping four columns from a 400-column table is far easier than listing 396 columns).

 

 

SQL versus Data step MERGE - efficiency, ease of use?

Great (and tough to answer) question! DATA step: more granularity, great for First. Last. Operations.

SQL: greater flexibility-no compulsion to use same named columns for joins, flexibility on join conditions (unlike data step that joins only on equality, SQL joins can be based on inequality as well) Both: speed – it depends on multiple things like just what you are doing, where the data lives (in SAS? On a DBMS?). You must test, unfortunately. A data step hash might be the best choice sometimes, a merge other times, and a straight SQL join in other cases.

 

When would you use where instead of on? (Besides join)

I can join tables either using the ON clause or the Where clause. In all my examples, I have shown you the ON clause. If I remove the ON clause and I want to join the two tables, I can just say from SAS. authors, SAS. Books. Now that's not ANSI (American National Standards Institute) standard. As far as possible, I want to keep my code ANSI standard in the event I am going to do some pass throughs. The other reason I am using the ON clause is because if you think of logical query processing order, we did not get into that. The first statement that is logically processed is the FROM. The next statement that is processed is the WHERE. So, you will see that the ON is part of the FROM. In terms of efficiency, the FROM comes before the WHERE, so you get the tad bit of efficiency by using the ON clause for a join. Are there any other places where you would use the ON clause?  No, a JOIN is where you would use an ON clause. I can always use the WHERE clause to act like a filter. Maybe I want to filter for country, etc.

 

What is the difference between the right join and right outer join in proc SQL?

Nothing but order and preference. LEFT is more commonly seen and is what I stick with, but I could write any left as a right or vice versa.

 

In a many-to-many situation, how applicable is FULL JOIN?

Full outer joins in a many-to-many relationship can be helpful when attempting to find mismatched, orphaned data, from both tables and wanted the result sets, not just matches.

 

Here is an example of performing a full outer join in a many-to-many situation.

data a;

input ID   $ Type   $;

datalines;

R      Up          

S      DOWN    

T      UP        

X      UP        

Y      DOWN      

Z      UP        

;

run;

data b;

input  ID   $  Type   $;

datalines;

1      IN

2      IN

3      OUT

4      OUT

5      IN

6      OUT

;

run;

data ab;

input  AID   $  BID   $;

datalines;

R     3

T      3

T      5

Z      6

;

run;

proc sql;

 

select *

from A

 

full outer join

AB on

A.ID = AB.AID

 

full outer join

B

on AB.BID = B.ID

;

 

Implicit vs explicit - which one is faster on joining huge datasets?

Explicit passthrough may technically be speedier when joining large database tables because you are speaking the language of the database and not shipping PROC SQL code to be interpreted & translated.

 

What was the class & link for more information about Dictionary Tables?

Check out this link for my paper on Dictionary tables where I have discussed many useful nuggets on dictionary tables and their value to get to know your data

 

Where you have multiple inner joins, is there order of processing the inner join?

Because you are using only inner joins, only the matches in all three tables will show. To view the execution plan, use the _Method option on the PROC SQL statement to learn more.

 

To me, the WHERE and HAVING options seem similar, what' s the difference between the 2?

Both the WHERE and the HAVING clauses filter data, with a difference. The Where clause filters row level, detail data. The Having clause filters summarized, grouped data.

Here is a video tutorial that explains the difference at 27:14 

 

 

Recommended Resources

Life saver tip for comparing PROC SQL join with SAS data step merge

One-to-One, One-to-Many, and Many-to-Many Joins Using PROC SQL

Understanding the SQL Pass-Through Available in PROC SQL

FEDSQL Procedure

Comparing Implicit SQL Pass-Through

Please see additional resources in the attached slide deck.

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.

Version history
Last update:
‎03-07-2024 01:55 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Labels
Article Tags