BookmarkSubscribeRSS Feed

Top 5 Handy PROC SQL Tips Q&A, Slides, Code, and On-Demand Recording

Started ‎10-12-2021 by
Modified ‎01-25-2022 by
Views 7,947

Watch this Ask the Expert session to learn five tips you may not know about PROC SQL and how to elegantly maximize human and computing efficiency. Users of all skill levels who are continuously improving processes and looking to stay within PROC SQL to analyze and process data will benefit from this session.

 

Watch the webinar

 

You will learn how to:

  • Join tables dynamically.
  • Create in-line views for joins.
  • Pivot like a dancer with the Boolean.
  • Recognize patterns in your data.
  • Create running totals.

 

The questions from the Q&A segment held at the end of the webinar are listed below. The slides, code and data sets from the webinar are attached.

 

Q&A

Instead of using ‘where’, can I use ‘on’?

Yes, you can use ‘where’ or ‘on’. The syntax is slightly different.

 

Would the PROC PRINT method run faster with a WHERE option on the data set versus a WHERE statement within the PROC?

It should run similarly.

 

PROC SQL join does not require the data step sorting first. Does PROC SQL sort first behind the scenes?

Yes, PROC SQL does a variety of tasks behind the scenes. You have less control with PROC SQL, but it tries to be as efficient as possible.

 

Do you need to create an alias for a join table?

An alias is not required, but personally, I find it much easier to work with. Especially as the join becomes more advanced.

 

How can I find the list of all dictionary tables?

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n02s19q65mw08gn140bwfdh7spx7.htm

 

Will Charu be showing how CASE WHEN works?

Here is the documentation for you:  https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0a85s0ijz65irn1h3jtariooea5.htm

 

Is there a performance difference between using SUM(CASE WHEN <conditions> then 1 else0) and the Boolean method shown?

In a case like this, formal benchmarking would most likely be required to test each method. What you would want to do is run the program three times one way, then three times the other. Document the time it took for each group and average it. Then compare. Depending on the data and what else you are doing it could differ.

 

Do you have any links on the PRXMATCH function?

Here is the official documentation:  https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0bj9p4401w3n9n1gmv6tf**ble...

Here is a SAS paper: https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/223-2007.pdf

 

How does PRXMATCH know what pattern to look for or is it just looking for 4 consecutive digits?

It uses PERL regular expressions that you specify. Here is the SAS documentation with a few examples: https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0bj9p4401w3n9n1gmv6tf**ble...

You can also google regular expressions and get a variety of great information. Regular expressions are not SAS specific.

 

View seems very handy, but for those of us who need to maintain an audit trail with our code and data, can you suggest a way to make views reproducible short of saving a data set?

Inline view is very different from SQL view. Inline view is basically a query inside a query. Save the code and document the comment line.

 

I notice that you don't use the JOIN clause, but instead have both tables in the FROM clause and use WHERE to qualify the join. Is there a reason for this?

Not really. I could have easily used the ON clause to Join tables instead of the WHERE. These are 2 alternate ways to perform a join.

 

Do joins in the WHERE clause perform faster than explicitly using the JOIN statement or, do they perform the same?

Benchmarking would be the best way to determine the answer to this ‘it depends’ question.

 

Is joining using the WHERE clause more efficient than using ON to join?

Typically, the ON clause is usually a tad more efficient because it’s executed before the WHERE.

 

Is this different than sum(case when [condition] and [condition] then [metric] else 0 end?

This is the SQL form of conditional processing.

 

PERL ... was not clear as to what pattern she was looking for?

See the first slide on tip #4. Find Patterns in Data. In the example, she was looking for 4 digits followed by a period.

 

I have a table in Oracle with a datetime variable. What is the correct SQL syntax to get all rows that are a certain year?

It depends on if you are creating a SAS dataset out of it. If you are, you can filter through the data and use a year function. If it’s not a SAS dataset, you would need to read it into Oracle. 

 

How can you use SQL to find rank of the value in a field?

Normally, you would use PROC SURVEY SELECT do that work, but in PROC SQL you can mimic ranking with:

proc sql outobs=10;

select model, avg(msrp) as AVGMSRP

from sashelp.cars

group by model

 

order by AVGMSRP desc;

 

Does SAS PROC SQL have window functions like other SQL applications?

See question below.

 

The inline view query expression is still using a physical table, so when can the programmer take advantage of having everything in the memory as it was mentioned in the presentation?

Certainly, the inline view in the example I used brings in an input physical table, however I joined it with a virtual table in memory to showcase the value of the inline view. Other usages of an inline view can be when all tables are virtual (created via a query that goes on the SELECT statement). Saving the user a lot of I/O movement.

 

Windowing functions in SQL, means time interval functions, such as, last week, last year, etc.?

The INT family of functions are all available in PROC SQL. In fact, all functions you use in the DATA step are available in PROC SQL.

 

Do all the DBMS vendors’, like Oracle, database functions apply in PROC SQL?

PROC SQL is ANSI compliant, so it respects ANSI functions. If an external DBMS vendor like Oracle etc. has specific Oracleisms being used, those may not be applicable in PROC SQL.

 

Regarding the ranking, I think you may use PROC SQL OUTOBS=10; ..... ORDER BY XXX DESC;?

Certainly can. For more sophisticated features all wrapped up in a neat PROC, I would suggest PROC RANK.

 

Which has better performance with inline view? IN SELECT statement or FROM Statement?

Benchmarking would be the best way to answer this question.

 

Could you use COUNT instead of SUM in the case of the haves and have nots?

No. The COUNT function only counts the number of rows of a column by its grouping. We need the SUM function to get the haves & havenots.

 

How many inline views can you use in PROC SQL?

There is no limit to inline views you can use in PROC SQL. The only limits are the computing resources available, I/O, memory, storage, CPU.

 

Adding a query in the WHERE is a subquery, in the FROM is an inline view, what do you call a query inside of the SELECT statement?

A subquery.

 

What version of SAS are you using? Looks more user friendly and easy to use...

For the presentation, I used SAS Enterprise Guide Version 8.2 running on SAS 9.4 M6.

 

Does the sequence of commands that was mentioned only apply to PROC SQL vs regular SQL (i.e. MSSQL)?

Because PROC SQL respects ANSI and also has SASisms for those times when ANSI falls short, there are some functions/formats that cannot be ported over to external database SQL.

 

What is the difference between WHERE and HAVING?

The WHERE filters on existing data, that is unprocessed. The HAVING filters grouped data which is processed.

 

Is joining datasets using PROC SQL faster and similar to the SAS merge data step? The SAS merge data step requires the data to be sorted before merging datasets.

PROC SQL is more flexible with joins due to its ability to join on inequality & other conditions. Regarding your SORT question, PROC SQL does an internal ordering, which is implicit while the DATA step is more explicit & requires a PROC SORT. So, in terms of actual work being performed to get the datasets in order there may not be an efficiency gain of one over the other. The only way to find out with certainty is to benchmark your steps.

 

For running totals, won't the data step code be a lot more efficient? It seems like the SQL solution is pulling a new copy of the base table for each row's running total amount.

Yes, the DATA step would be more efficient to get running totals. As I mentioned in my presentation, use the DATA step for granular work, running totals, first in the group etc. The purpose of my sharing the SQL code was for users who specifically requested this task to be performed in SQL.

 

In the PROC SQL for shoes, I noticed that you called a column from table B before you referenced the actual table. Could you explain why that works?

I pre-worked the Shoes dataset to store the data step iterator _N_ as a unique row value. I then used this unique row value to create running totals in PROC SQL as SQL doesn’t have an automatic row counter.

 

What is the most optimal access module for SQL data? ODBC, OLEDB or native SQL/Access?

The answer to this question is dependent on user needs and resources. What products you have licensed, where you are pulling your data from, what computing resources do you have available and what are you trying to conserve. Check out my blog post if you would like to learn more. In the post, I’ve also attached a pdf to the entire presentation. 

 

Can PERL look for text patterns as well?

Yes, it can. You would use a \w instead of a \d that I used in my example. Here’s the PERL cheat sheet I also provided in the links section of the presentation. 

 

Can we use all the OPTION's of DATA step in PROC SQL?

Most dataset options can be used in proc SQL;

There may be a few differences in usage. See an example below to bring all columns beginning with the letter G. 

 

proc print data=sashelp.citiqtr;

var g:;

run;

  

proc sql;

select *  from sashelp.citiqtr(keep = G:);

 

How do you use PERL to find a pattern with mixed alpha and numeric values?

Check out this PERL cheat sheet for a variety of ways you can customize your pattern search with PERL. 

 

In SAS Viya, FEDSQL is used more often than SQL. Can you please explain which tips you covered will also work in FEDSQL (in the way you showed for SQL)?

Here's the official documentation on FEDSQL.

 

The PRXMATCH seems very powerful. Can you build combinations of alphanumeric patterns?

Check out the PERL cheat sheet for more possibilities.

 

Do you find the _method option useful when using PROC SQL? If so, could you speak to that a little bit?

_METHOD is useful for looking under the covers to examine what PROC SQL executed. Here’s a paper if interested in a deeper dive.

 

Does the WHERE in SQL effectively have the same efficiency gain as using a WHERE in a standard SET statement?

Any efficiency comparison may be best answered by benchmarking data. I’ve covered top 10 programming efficiencies here. 

 

Why don't you need AS when creating a field using a Boolean statement? Example sum(salary < 60000 and gender='M') "Males < 60000'

Since I was creating a table with column names, I didn’t use the alias AS. This was simply a report so the label ‘Males <6000’ sufficed for my purposes.

 

To find a pattern, can you also use LIKE (wild card)?

Yes, you can. PERL provides many more options which is why I felt it might be valuable to share.

 

For the view/nested idea, can we use partition over in SAS?

Partition over is not available in PROC SQL.

 

Is the comma the same as a left join?

No. The comma is not used for a Left join.

 

Does this work for Teradata too?

So long as the Teradata install is using ANSI SQL, your PROC SQL code will work. Outside of ANSI any unique Teradata/PROC SQL features will only work in the specific installation of SQL.

 

Recommended Resources

FULLSTIMER System Option

Top 10 SQL tricks in SAS®

SAS Helps Data Sets Descriptions

Find your data pattern with PERL

PERL Regular Expressions Tip Sheet

Using The Boolean operation in PROC SQL

Know thy Data: Techniques for Data Exploration

The Power of SAS SQL

Step-by-Step PROC SQL

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.  

Version history
Last update:
‎01-25-2022 01:21 PM
Updated by:

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Article Labels
Article Tags