BookmarkSubscribeRSS Feed

Proc SQL or Proc FEDSQL: Which Should a Programmer Use? Q&A, Slides, and On-Demand Recording

Started ‎02-03-2021 by
Modified ‎02-03-2021 by
Views 5,315

Proc SQL or Proc FEDSQL: Which Should a Programmer Use? Q&A, Slides, and On-Demand Recording

 

Watch this Ask the Expert session to learn when and how to use PROC FedSQL and when it offers benefits over PROC SQL. 

 

Watch the webinar

 

Join T Winand as he discusses the differences between PROC SQL and PROC FedSQL. This webinar is ideal for SAS®9 and SAS® Viya® programmers who access and query data using DATA step or PROC SQL. Watch this webinar to learn:

 

  • When and how to use PROC FedSQL.
  • When to use PROC FedSQL versus PROC SQL or DATA step.
  • The differences of running PROC FedSQL in SAS®9 (or the Viya Compute Server) versus in the CAS Server.

 

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

 

Why does PROC SQL not run in CAS (Cloud Analytics Services)?

PROC SQL does not conform to ANSI 1999 standards and does not allow for distributed, multi-threaded processing that CAS requires. In Viya, PROC SQL would run on the compute server.

 

In Viya, when does PROC FedSQL run in Cloud Analytics Services?

In Viya PROC FedSQL can run on both the Compute Server and in CAS. To execute FedSQL statements on the CAS server, specify the SESSREF= (or SESSUUID=) connection option with a CAS session name in the procedure statement. Either load tables into your CAS session using other tools and query the tables with PROC FEDSQL, or set an active caslib for your CAS session and query tables from the caslib by name. When you query unloaded tables from the active caslib, FedSQL passes requests that are eligible for implicit pass-through to the data source for processing or dynamically loads the external data into your CAS session for processing.

 

Does it make sense to convert my SAS 9 PROC SQL code step to Proc FedSQL?

As indicated in the presentation, consider “Syntax, Performance, and Output” when determining if it makes sense. What would be the performance gains? If, for instance, the SQL query is relatively simple, against a local SAS dataset, and runs quickly, then there are likely no advantages for changing syntax to run on PROC FedSQL. However, if the query is accessing data from one or more external sources, the query is complex, the external source data types are not supported with PROC SQL, and/or performance is poor, then it may make sense to convert the step to PROC FedSQL. Of course, the resources and architecture of your SAS need to be considered. And, I recommend performing benchmark tests – running the same query using PROC SQL and PROC FEDSQL.

 

How do I debug issues when using PROC SQL?

Of course, in general, use the SAS log. However, by default, there is no indication regarding the success or failure of the SAS/ACCESS engine or data connector to generate SQL code that is passed to the database. The SASTRACE= option enables you to examine the SDQL that is submitted to the DBMS: https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=acreldb&docsetTarget=n...

 

How do I find out what version of SAS utilizes if using SAS Studio?

You can click on “?” or Help on the top right corner, then click on “About SAS Studio”. This provides you with the SAS Release (e.g. SAS release: 9.04.01M3P06242015 = 9.4 M3). You can also run the code: “proc setinit; run;”

 

Can you run an update query on one table using data in another table with PROC FedSQL? This is a little difficult with PROC SQL.

Yes, you can, but not in CAS. You would use the UPDATE statement documented here:

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=fedsqlref&docsetTarget...

And there is a simple example in the FedSQL expressions section that shows how this could be performed by using a subquery:

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=fedsqlref&docsetTarget...

 

Can you also use FedSQL in Enterprise Guide?

Yes. You can in SAS 9 in Enterprise Guide when programming. There is no task that uses PROC FedSQL. The query builder uses PROC SQL. If SAS Enterprise Guide in SAS 9 is architected to connect to Viya and CAS, you can also start a CAS session, run steps in CAS, and run PROC FedSQL in CAS as well.

 

What does DSN stand for?

Data Source Name. It is the name that applications use to request a connection to an ODBC Data Source. In other words, it is a symbolic name that represents the ODBC connection. It stores the connection details like database name, directory, database driver, UserID, password, etc. when making a connection to the ODBC.

 

Is a separate SAS server (FedSQL server) needed to run PROC FedSQL?

No. PROC FedSQL is a procedure included with Base SAS starting in SAS 9.4.

 

Does PROC FedSQL implicit equal to PROC SQL explicit pass thru?

No. PROC FedSQL explicit pass-through is equal to PROC SQL explicit pass-through. Both enable you to create a query using DBMS specific SQL: syntax and to “push” that query directly to the DBMS for processing.

 

Can you submit multiple SQL statements in one FedSQL step? Sometimes it takes multiple steps (multiple temp tables) inside the DBMS, then select rows from the last table and save them into SAS dataset.

You can have multiple select statements within a PROC SQL step, but it depends what those steps or statements are. If you are accessing different tables, you may be able to reduce the number of steps needed using FedSQL.

 

While connecting to SQL server. Fieldnames are truncating, is there a way to keep the complete name of the field? Any fix for that?

Here are a couple of suggestions. One from SAS Community postings: https://communities.sas.com/t5/General-SAS-Programming/Truncation-of-DBMS-column-names/td-p/439477 And one from a Stack Overflow posting: https://stackoverflow.com/questions/54099588/copy-table-from-sql-server-to-oracle-in-sas-column-name.... I am not sure of your specific situation, but I bet one of these suggestions will work.

 

With the introduction of FedSQL is there a query profiler that identifies if Proc SQL or FedSQL is best option?

That would be very nice! It does not exist yet. But I suspect that R&D might be working on something like that for the future. Note that in SAS Studio 5.2 the Query Builder Task allows you to choose whether the task generates PROC SQL code or PROC FedSQL code.

 

All the examples of connections require PW to be included.  I've been told that when the credentials to the database are Network driven (UN and PW) that those parameters are not needed, but rather there is alternative syntax whereby Viya passes the users info. to the database.  Is that true?  I'm specifically interested in Oracle connections.

In the quick reference the example CASLIB for Oracle shows the following:  caslib mycaslib desc="Oracle Caslib"

     datasource=(srctype="oracle",

                 username="myuser1",

                 password="myPwd",

                 path="//machine.lnx.com:1521/exadat");

This include un and pw.  And, of course, something like PROC PWENCODE can be used to encode the password.  But you are correct according to the following documentation link: https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=casref&docsetTarget=p1... “You can associate data connectors that require logins with a domain on the CAS server. When users connect to the data source through a data source name (DSN), the domain name is used to retrieve user credentials that are associated with that data connector. The credentials are then passed to the third-party data source. User credentials are stored on your system and are accessible by the CAS server.”

 

EG Generates PROC SQL in Query Builder. Can EG also optionally generate PROC FEDSQL code or do we have to manually code PROC FEDSQL from EG.

No. The SAS Enterprise Guide Query Builder only generates PROC SQL code. However, in SAS Studio 5.2 the Query Builder Task allows you to choose whether the task generates PROC SQL code or PROC FedSQL code. It is an option that you can set.

 

What does Fed stand for?

Fed stands for Federated. In our context, a Federated query provides the ability to take a query and delivers results based on information from many different sources.

 

Can we use Studio to join Oracle and Snowflake?

Yes. You can use SAS Studio or SAS Enterprise Guide. How you do that depends on the SAS/ACCESS engines or Data Connectors deployed in your organization – direct access engine, ODBC access engine etc. And it depends upon whether you are connecting to these data sources in SAS 9 or CAS.  But the answer is yes.

 

You said PROC FedSQL can access more than one DB at a time, does that mean it can access a HADOOP table and SAS dataset at the same time?  I'm not able to create HADOOP tables, so I have to pull all the HADOOP table data down to SAS, then merge the data with a SAS dataset.

Yes.  PROC FedSQL can access any data source that conforms to the ANSI 1999 standard.  Hadoop does conform. However, PROC FedSQL still needs to make connections to the different data sources. And that connection is made using a SAS LIBREF or CASLIB that uses a SAS/ACCESS (like SAS/ACCESS Interface to Hadoop) or DATA CONNECTOR engine to make that connection.

The SAS documentation is not quite clear whether CASLIBs contain permanent, temporary, or even data in transient state (cf. also Pendergrass, 2017, p.8): “All tables in CAS are permanent unless declared temporary.”

Pendergrass, Jerry (2017). The Architecture of the SAS® Cloud Analytic Services in SAS® Viya™ Pendergrass, SAS Institute Inc. Paper SAS309-2017

I think it's the other way around...

I understand the confusion. I skimmed through the paper you reference, and it looks like it was well written. I believe that the confusion comes from the fact that a CASLIB has two pointers – not just one. One pointer is to the data source located on disk. The source could be a SAS Dataset, a SAS HDAT file, a Teradata table, a Hadoop Hive table, a Parquet Table on S3, …  These tables persist. But when these tables associated with CASLIBS are loaded into memory for processing in CAS, the in-memory tables have a Session scope or a Global scope (if the CASLIB is a global CASLIB and the tables are promoted). Session scope tables in memory are temporary in that they disappear when the session ends. But the table that resides in persistent storage associated with the CASLIB still remains.

 

How do you use PROC FedSQL to connect to multiple databases using explicit pass-through?

You do not. The capability of Proc FedSQL to connect to multiple data sources is established with SAS LIBREFs or CASLIBs. Proc FedSQL could use explicit pass-through to query each source individually using that sources own SQL syntax. But that defeats the value of FedSQL.  Since FedSQL can submit a query that all conforming data sources can understand, then implicit pass-through is the preferred method when querying multiple ANSI 1999 compliant sources.

 

In PROC SQL we create a macro variable with into but how do you do this in PROC FedSQL?

You cannot. Sorry. If you need to create dynamic MACRO variables from SQL query, then use PROC SQL. PROC SQL has a lot of built-in SAS programming capabilities that make it great for everyday SAS programming tasks such as this. PROC FEDSQL, since it needs to support specific ANSI 1999 standards cannot provide this

 

Can we use PROC CAS to access Microsoft SharePoint files?

No. Not to my knowledge and my search/queries do not indicate that this is possible. SAS 9 or the Viya compute server can access SharePoint files there are different ways this could be done. Here is a communities posting on the topic: https://communities.sas.com/t5/SAS-Procedures/Access-Sharepoint-using-SAS/td-p/491527. And here is a BLOG post from Chris Hemedinger on using SAS with MS 365:  https://blogs.sas.com/content/sasdummy/2020/07/09/sas-programming-office-365-onedrive/ Then you could upload the files into CAS. Whether there would be advantages in working with these files in CAS versus on the Compute Server would depend on the nature of the work you are intending and the size/complexity of the data. 

 

I thought the difference between SQL and FedSQL was that FedSQL can access multiple sources.  But your live example for performance showed both SQL and FedSQL accessing multiple sources, right?

(Response to be provided in a follow-up post)

 

Given the performance improvements using FedSQL.execDirect over PROC FedSQL as shown in your last example, is it generally beneficial to learn the CAS syntax and always use the execDirect function?

(Response to be provided in follow-up post)

 

It would be nice to have the .sas program from the demo, with the examples. Is that possible?

The programs are attached here in this post.

 

 

Recommended Resources

SQL Procedure (documentation)

FedSQL Procedure (documentation)

SAS® Viya® 3.5: FedSQL Programming for SAS® Cloud Analytic Services (documentation)  

High-Performance Data Access with FedSQL and DS2 (paper)

Anything You Can Do I Can Do Better: PROC FEDSQL VS PROC SQL (paper)

SAS® Viya®: How to Emulate PROC SQL Using CAS-Enabled PROC FedSQL (blog)

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.  

Version history
Last update:
‎02-03-2021 07:31 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Tags