BookmarkSubscribeRSS Feed

SAS SQL: An Introduction

Started ‎03-14-2017 by
Modified ‎05-02-2018 by
Views 10,242

 

Did you miss the Ask the Expert session on Using SQL within SAS? Not to worry, you can catch it on-demand at your leisure.

 

Watch the webinar

 

 The session covers basic SQL concepts, including:

  • Basic SQL queries and syntax.
  • Creating columns.
  • Subsetting and conditional logic.
  • Introduction to joins.
  • Creating macro variables.

 

I have added the most recent Q&A below.

 

When manipulating and accessing data, which is the best way to proceed: SQL or DATA step programming?

­

The DATA step and SQL should be considered complimentary techniques, as they join the data differently behind the scenes.  The paper below outlines some similarities and difference between the 2 techniques:

http://www2.sas.com/proceedings/sugi30/249-30.pdf

 

­How do you know to use percent6 as the format?  Are there other options?

­

Yes, there are many choices for formats.  You can find each format along with its description here: http://support.sas.com/documentation/cdl/en/leforinforref/64790/HTML/default/viewer.htm#p0z62k899n6a...

 

­Can you explain why it's useful to store a macro variable in the log?

­

The macro variable is not actually stored in the log.  It is stored in an area of memory called the symbol table.  We viewed the macro variables’ value in the log by using the %PUT statement.

 

­Is there a way to use DO loop in PROC SQL?

­

PROC SQL does not support DO loop processing.

 

What does the STIMER option do in Proc SQL­?

 

It can be used during development and debugging to show how fast PROC SQL is running. An example can be found here:

http://support.sas.com/documentation/cdl/en/sqlproc/63043/HTML/default/viewer.htm#n0a693shdq473qn1sv...

 

In SAS Enterprise Guide, where can PROC SQL be found?

 

SAS Enterprise Guide incorporates PROC SQL behind the scenes in the Query Builder as well as some tasks.

­

­Can you have multiple INTO keywords in a PROC SQL step?

­

You can have one INTO keyword, but you can use it to create multiple macro variables.  Please see the following link for more examples:

http://go.documentation.sas.com/?docsetId=sqlproc&docsetTarget=p0hwg3z33gllron184mzdoqwpe3j.htm&docs...

 

­Can you explain the difference between merging using the ON clause versus the WHERE clause?

­

The ON clause provides the join criteria when we are using outer joins.  This is ANSI standard.  The WHERE clause is used for the join criteria when we are working with inner joins.

 

­When you performed the inner join where c.providerID = p.providerID, SAS knew to join the tables based on Cartesian product. Is that correct?  If so, what happens if the fields do not match?

­

By definition, joins within SQL will perform a Cartesian product of the matching values.  SQL will use the definition of the type of join we are using (inner, right join, left join, full join) to determine what happens to the non-matching values.

 

­Why did the SELECT statement show different column names than what was selected? The column seemed to be more "legible" than a simple column name. e.g. "prodID" would show up as "product ID"­.

 

I had the SAS system option LABEL set such that the descriptive labels will be used if they are available. 

 

­Which SQL ANSI standard is supported?­

 

PROC SQL follows most of the guidelines set by the American National Standards Institute (ANSI) in its implementation of SQL. However, it is not fully compliant with the current ANSI standard for SQL­. Here is a link that will describe the differences.

 

­How do you select only the first 50 rows?

 

One way in which you could do this is to use the OBS= data set option. A portion of the syntax follows:

from health.claims_sample (obs=50)

 

Since there is no CREATE TABLE statement in these examples, I'm assuming that it's not creating a physical SAS dataset, correct? Just a display in the output window?­

 

Yes, in many of our examples the results set is a report. We added the CREATE TABLE statement in order to create an output table from the results set.

 

­What if CHGAMT is 0 in the calculation to create percentage (EERESP/CHGAMT)?

 

The resulting value is missing, as this would represent division by zero.

 

­Can calculated values be referenced by alias or does the calculation have to be specified again for additional uses. IE: PERCENTAGE is calculated, can I do CASE PERCENTAGE... or do I have to do CASE EERESP/CHGAMT....?­

 

You can use the CALCULATED keyword in order to specify to SAS that you are using a calculated column.

 

­

Why did the format=percent6.1 did not always show 1 decimal position in the output?­

 

We could increase the column width and have all values accurately represented. The format requires a print position for all characters as well as the possibility of parentheses for negative numbers.  For instance, format=percent10.2 would accurately show all decimal positions in our example.  I have changed thee example to be more clear for future reference.

 

­Are xxx.DAT files supported?­

 

PROC SQL does not read raw data files. SQL can read existing SAS data sets (tables) or relational databases if the appropriate SAS/Access engine is licensed.

 

­Can I download a free version of SAS and setup personally?­

 

Yes, there is a free version of the SAS University for personal use. You can find it at this location.

 

Is the CALCULATED keyword something that works only within the PROC SQL?­

 

The CALCULATED keyword is a SAS enhancement to the ANSI standard.

 

So to be clear, an outer join is basically inner join with additional parts of the table? (e.g. if you do an outer join you are by default including an inner join?)­

 

The outer join does include the matching values. However, with outer joins, you are limited to 2 tables, whereas the same limit does not apply to inner joins.

 

­For joining two tables, which technique is preferable - a MERGE in the data step or a join in PROC SQL where both the tables are large in size?­

 

Since the two processes are different, you must decide what the desired output should look like. PROC SQL does a Cartesian product on the matching values, and the data step does not.  After you decide which process yields the desired results, then benchmarking is recommended to see which method is more efficient.

 

The option NUMBER appears on the PROC SQL code as see in the following. What does it do?

PROC SQL NUMBER;­ 

 

The NUMBER option provides line numbers in the output for ease of reading the results set. 

 

Is there a link that shows all of the different statements for each type of join?­

 

Here is a good place to start:

 

­I wonder the performance between SQL join and DATA step merge.­

 

It is recommended that you perform benchmarking for production jobs to evaluate performance between PROC SQL and the data step.

 

Can we use PROC SQL in place of first. and last. logic?

 

First. and Last. programming is best implemented in Data step programming.

 

­In the comparison between SQL and the data step, you mentioned "no equality required for SQL". I am not quite sure I understand.­

 

SQL does not require that data be joined on an equality, for instance, a.providerID=b.providerID. 

 

Same question regarding "coalesce". Is COALESCE SQL ANSI standard or is it a SAS enhancement in PROC SQL?­

 

COALESCE is found in some implementations of the ANSI standard. In others the equivalent is IFNULL. See here for more information.

 

­Does INTO make the macro variable global to allow you to use it in the other PROC SQL steps, or was there something else happening?­

 

In our example, the macro variable was created as a global variable. 

 

­Can we use proc sql in SAS student edition?­

 

Yes the university edition is for private use and can be downloaded from this link.

 

­Do all functions like INDEX,PIVOT and CAST work in PROC SQL?­

 

Please see here for comparisons between PROC SQL and the ANSI standard.

 

­Do we always have to use QUIT; for every proc SQL step?­

 

No. PROC SQL will see QUIT; or the beginning of another step as a step boundary.

 

Prior Q&A

­

 

When will SQL accept windowing functions?­

 

SAS SQL does not currently support windowing functions. Please see the following post for more information:

 

­

At some point, can you speak about pass-through queries in SAS and the correct syntax for the connection string?­

 

Pass-through queries can be used in SAS to send DBMS-specific statements to a DBMS for processing. The syntax for the connection string varies based upon which database you are accessing.  The following introduces SQL pass-through and provides some examples relevant for connecting to Oracle.

 

There is also an Ask the Expert session that addresses this topic. It is entitled ‘The What, When, and Why of SAS/ACCESS Interfaces.'

 

Is a data view akin to a dashboard?­

 

These are not equivalent. A view is a type of SAS data set that stores a set of instructions for surfacing the data when the view is used.  Views can be created with SQL or the data step. A dashboard provides views about a particular business process or metric, much more like a report. Please see the following for more information about SQL views:

 

­

What about using proc SQL against Hive or Impala tables? Will this be addressed? ­

 

For our introductory session, we focus on reading SAS data only. Please see the following for an introduction to reading Hive or Impala tables.

 

Can I use PROC SQL to read a CSV file? If so, what is the syntax?­

 

The SELECT statement in PROC SQL is appropriate for reading SAS data. It can also connect to a database provided that you have the appropriate SAS/Access engine licensed.

 

­Why do you use a QUIT statement when your next step is PROC SQL?­

 

The QUIT statement is not required when the next statement is PROC SQL. It is included in the demonstration to allow the presenter a modular approach that is flexible for demonstration purposes.

 

­Is there an equivalent to having a DROP= option on a SET statement in a data step?­

 

Yes, on the FROM clause and following the name of the data set, you can use the DROP= data set option. Please see the following for more information.

 

Will there be anything on subqueries?

 

This presentation does not contain a discussion of subqueries. Information on subqueries can be found here.

 

What does the 6 represent in the format=percent6.1?

 

The 6 indicates that we are asking for 6 total columns to display the value. Anything that requires a position would be included in that number, including decimals and percent signs.

­

 

When adding labels in the select clause, are you still bound by the variable naming rules of SAS?­

 

No, you are not bound by SAS naming conventions. Labels are designed to be descriptive and can include blanks and other special characters.

­

 

Please list the advantages and disadvantages of PROC SQL over MERGE statement.­

 

An introduction to this topic is addressed here.

 

What is difference between WHERE and HAVING?­

 

A WHERE clause subsets one row at a time. The HAVING clause is used to subset a group of data, possibly after performing some sort of summarization.

­What does the code else "refund" do?­

 

The ELSE is used to specify a value that will be used if none of the previous conditions in a CASE expression are met.

­

 

What is the order that the CASE statement is executed? Is it after the FROM clause?

 

Yes, the FROM clause is executed 1st

­

 

What's best way to get log info from PROC SQL while in display manager?­

 

You can ­view the log as usual. No special operations are required for PROC SQL­.

­

 

After you create a table using SQL, the table is a temporary table in the WORK library. How can we create the table in a permanent library?

 

 Assign a permanent library using the LIBNAME statement.  Then, on the CREATE statement in PROC SQL, choose a 2 level name such as the following:

­create table perm.table1 ...

 

The presentation mentions horizontal joins. Can SAS SQL also be used for vertical joins?­

 

Yes, you can use SET operators to accomplish this. The kind of vertical join that we might be most familiar with is the equivalent of appending or concatenating data using the data step.  This can be accomplished with the OUTER UNION CORR operation in PROC SQL.

­

 

Why is the presenter using SQL 89 vs SQL 92? Does SAS have any limitations using SQL 92?­

 

PROC SQL does closely align with the ANSI SQL-92 standard. If you are referring to the syntax used for the joins, either of the following could be used:

…FROM one, two;

Or

…FROM one INNER JOIN two;

 

What does the "number" option mean on the PROC SQL statement? ­

 

It is a way for us to request that row numbers be displayed in the output.

­

 

Is it possible to create a variable with sum using a group by and another variable with overall sum in the same SQL step?­

 

This would likely require 2 SELECT statements in PROC SQL. This might be a time when PROC MEANS/SUMMARY with a CLASS would be a good fit for the task. 

 

Is there a data step equivalent for the SQL FULL JOIN?

 

The data step process that is most like the FULL JOIN is the data step MERGE. Please be aware that the process by which SQL and the data step join their data is different.  Thus the results will not always be the same between a FULL JOIN and a data step MERGE.

 

Is the SAS SQL language the same for z/OS?­

 

Yes. Since Base SAS includes PROC SQL, it runs on any platform­.

­

 

Is there a web session that covers topics such as HAVING in more detail?­

 

Currently, there is no session that addresses concepts from this webinar in more detail. We do have a session that addresses reading database data.  It is ‘The What, When and Why or SAS Access Engines’

­

 

Is there a way to outer join two tables and have the resulting output replace one of the original source tables? ­

 

Yes. On the CREATE statement, use the name of the table that you wish to replace.

 

How do I create a new variable in PROC SQL with the following calculation?

 

Newvariable=(0.5 x √[SumX]) + (0.8 x √[SumY]) + (0.6 x (log([Z] +8))) + (0.06 x ([Zx 5])) + 0.32

This expression might be best accommodated by the data step. You will need to use the SQRT function to perform square roots, the * operator for multiplication, the + operator, and the LOG function. 

­

 

Can I use PROC SQL to create a random sample? For instance, PROC SQL can be used to combine data from different tables using a common column like ID. If I only want to keep 150 randomly selected rows from the results set, how could I do this?

 

This is an example of a task that might be more typically accomplished using the data step. If you would like to use PROC SQL, the following paper addresses this topic:

 

The previous analyst provided me with a SAS format library to use, but they used SAS 32 bit. This is incompatible with my 64 bit. How can I get PROC SQL to open this?

 

Please contact the person who sent the information to you in order to resolve this question.

 

Can you address a little more what the COALESCE function does? Is it needed only when making reports from PROC SQL, or when making tables (datasets) from PROC SQL?

 

It can be used in either situation. For example, it can be used to consolidate all X values into a single column for reporting purposes. Here and here are a few helpful links.

 

Can you provide examples that use both the Group By and the Order By?

 

Below is an example from the session that includes an added ORDER BY clause. The GROUP BY indicates the grouping values that are to be used for summarization.  The ORDER BY is used for ordering the results set.  You can think of an ORDER BY as a sorting of the data.

 

proc sql number;

   select c.providerID, providerName, sum(chgamt) as sumChg, sum(lndiscamt) as sumdisc,

         sum(eeresp) as sumeeresp, sum(paidamt) as sumpaidamt

     from health.claims_sample as c, health.provider as p

     where c.providerID = p.providerID

      group by c.providerID, providerName

      order by providerID

;

quit;

 

I would like to know more about PROC SQL and the macro language. Are there other things to know besides the INTO clause?

 

Here and here are some helpful links for further investigating PROC SQL and the macro language:

 

 

Recommended Resources

Course: SAS SQL 1: Essentials
Book: PROC SQL by Example: Using SQL within SAS®

 

Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From the Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.

 

NOTE: For best results when opening the attached slides, click on the “download” icon.

Comments

I know you mentioned the statements for each type of join. What I am wondering is what are the types of join that can be used? I assume Inner, Right, Left, and Full - but it doesn't specify.

 

Thanks,

 

Shawn

fencepencil

Hi!  Yes, the types of joins that are supported are inner, left, right, full.  Also supported are specialty joins, which are special cases of standard join types.  You can find this information here: http://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.2&docsetId=sqlproc&docsetTarget=p0...

That is a fantastic resource, and will be very helpful. Thank you so much 🙂

 

Shawn

fencepencil

 Hello,

Your Webinar was very helpful. However, you did speak about cartesian product interactions for Inner Join but not for Full Join. I run a sas program and had a problem where observations that should be 57Million is now over 300Million.

This was what i run;

proc sql;
 create table t12 AS
  select 
        coalesce(t1.ID,t2.ID) as ID,
        t1.Evnop, t1.Syst, t1.Conyst, t1.Contr, t1.Metype, t1.Steno, t1.Epidem, 
        coalesce(t1.Entrant,t2.Entrant) as Entrant, 
        coalesce(t1.Adde,t2.Adde) as Adde,
        t2.Set1, t2.Set2, t2.Set3, t2.Set4, t2.Set5,t2.Set6,t2.Set7
    from table1 t1 FULL JOIN table2 t2 on (t1.Adde = t2.Adde);
  quit;

 What is the probable sas code i have to run to avoid the cartesian product interaction happening in my case, when i use a Full Join?

Thank you.

Hi, Thank you for your question.  ANSI standard SQL produces a Cartesian product for the matching rows in an inner join or any outer join.  (left, right or full)  Since SAS abides by ANSI standard SQL, this behavior cannot be changed when using SQL.  The data step does not produce a Cartesian product.  This would be the alternative to using SQL.

That is a fantastic resource, and will be very helpful.

The procedure PROC SQL is used to process the SQL statements. This procedure can not only give back the result of an SQL query, it can also create SAS tabl...Read More

Version history
Last update:
‎05-02-2018 01:49 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Labels
Article Tags