BookmarkSubscribeRSS Feed

FedSQL Implicit Pass-Through Facility for CAS

Started ‎05-03-2018 by
Modified ‎05-03-2018 by
Views 4,313

Everyone knows the SQL pass-through capabilities, whether it is implicit or explicit, that we have had for a very long time in the versions 6, 8 and 9 of SAS and its popular SAS/ACCESS module. Who remembers explicit pass-through was already available in version 6 in addition to the old way of accessing DBMS data with access descriptors and view descriptors using PROC ACCESS! OK, stop talking about the past. Let’s look at the present and future.

 

In SAS Viya 3.3, there are 2 SQL pass-through facilities:

 

  • The “traditional” one powered by the SAS/ACCESS engines in the SPRE (SAS Programming Runtime Environment) that is part of any Viya deployment. CAS is not involved with this one.
  • A new one that is tied to CAS and accessible through FedSQL language, powered by the SAS Data Connectors.

This article focuses on the second one.

 

I could also have added to this list the WHERE clause processing that is available in the CASUTIL procedure for loading data coming from databases into CAS:

 

  • SAS syntax WHERE clause processing, similar to implicit pass-through: the SAS syntax is translated, when possible, to an appropriate SQL syntax that is passed down to the database (WHERE option). This is new in Viya 3.3.
  • DBMS syntax WHERE clause processing, similar to explicit pass-through; the DBMS syntax is passed down to the database as is (DBMSWHERE option).

 

But let’s talk about FedSQL.

 

Among the new features of FedSQL in Viya 3.3, there is the new implicit pass-through facility. Basically, it allows users to pass a SQL query down to the database and get the results loaded into CAS.

 

Back in CAS/Viya 3.2, what is the FedSQL default behavior?

 

When a FedSQL query is accessing one or more tables from one or more databases, the following happens:

 

  • If all of the tables are already loaded in memory in CAS, then the FedSQL query runs in CAS
  • If any of the tables is not already loaded in memory in CAS, then this table is loaded temporarily in memory in CAS on-the-fly and, once every table is in CAS, the FedSQL query runs in CAS

So, in Viya 3.2, the FedSQL query always runs in CAS but this is not mandatory to “pre-load” the input tables in CAS to make it work. The “on-the-fly” loading loads a table entirely in CAS. No where clause nor variable selection is taken into account.  

 

 

Now, in CAS/Viya 3.3, what is the FedSQL default behavior?

 

  • If all of the tables are NOT loaded in memory in CAS AND they belong to the same DBMS CASLIB, an attempt is made to implicitly pass the full query down to the data source for processing, and only the result gets loaded into CAS
  • If pass-through is not possible (use of specific SAS functions, different CASLIBs involved, one table at least is already loaded in CAS), the physical source tables are loaded temporarily into CAS on-the-fly and the request is processed in CAS

 

Let’s take 3 examples.

 

Example 1:

 

caslib pg_db1 datasource=(srctype="postgres",username="mypguser",password="XXXXX",
   server="myserver.mydomain.sas.com",database="dvdrental",schema="public") ;

proc fedsql sessref=mySession _method ;
   create table casuser.myresults{options replace=true} as 
      select film.title, film_category.category_id
   from pg_db1."film" as film,
        pg_db1."film_category" as film_category
   where film.film_id=film_category.film_id and 
         film_category.category_id=1 ;
quit ;

 

In this example, both tables FILM and FILM_CATEGORY belong to the same PG_DB1 CASLIB, and the SQL syntax looks ANSI-compliant. If both tables are not already loaded into the PG_DB1 CASLIB, then pass-through will be enabled.

 

You can observe this in the SAS log:

 

 Methods for full query plan
 ----------------------------
 Number of Joins Performed is : 1
         MergeJoin (INNER) 
           Sort 
             SeqScan with qual from PG_DB1.film_category 
           Sort 
             SeqScan from PG_DB1.film 
  
 Offloaded SQL statement
 ------------------------
  
         select "FILM"."title", "FILM_CATEGORY"."category_id" from "public"."film" "FILM", "public"."film_category" "FILM_CATEGORY" 
 where  ( ( ("FILM"."film_id"="FILM_CATEGORY"."film_id")  and  ("FILM_CATEGORY"."category_id"=1) ) ) 
  
         CAS output table: CASUSER(mysasuser).MYRESULTS
  
 NOTE: The SQL statement was fully offloaded to the underlying data source via full pass-through

 

The SQL statement is fully offloaded to the PostgreSQL database. The result will be injected in CAS in the output table.  

 

 

Example 2:

 

caslib pg_db1 datasource=(srctype="postgres",username="mypguser",password="XXXXX",
   server="myserver.mydomain.sas.com",database="dvdrental",schema="public") ;
caslib pg_db2 datasource=(srctype="postgres",username="mypguser",password="XXXXX",
   server="myserver.mydomain.sas.com",database="dvdrental2",schema="public") ;

proc fedsql sessref=mySession _method ;
   create table casuser.myresults{options replace=true} as 
      select film.title, film_category.category_id
   from pg_db1."film" as film,
        pg_db2."film_category" as film_category
   where film.film_id=film_category.film_id and 
         film_category.category_id=1 ;
quit ;

 

In this example, both tables FILM and FILM_CATEGORY do NOT belong to the same CASLIB. They belong to different databases in PostgreSQL, addressed by 2 different CASLIBs. Pass-through is not possible but the FedSQL query won’t fail. If both tables are not already loaded respectively into the PG_DB1 and PG_DB2 CASLIBs, they will be loaded on-the-fly, and the FedSQL query will take place in CAS.

 

Here is the log for this example:

 

 Methods for full query plan
 ----------------------------
 Number of Joins Performed is : 1
         MergeJoin (INNER) 
           Sort 
             SeqScan with qual from PG_DB2.film_category 
           Sort 
             SeqScan from PG_DB1.film 
  
 Methods for stage 1
 --------------------
         HashJoin (INNER) 
           SeqScan with _pushed_ qual from PG_DB2.film_category 
           SeqScan from PG_DB1.film 
  
         Stage query: create table "CASUSER(mysasuser)"."MYRESULTS" {options  replace=true tableID=3} as select "T2"."title", 
 "T1"."category_id" from "PG_DB2"."film_category" {options tableID=1} T1 _hash_ inner join "PG_DB1"."film" {options REPL=YES 
 tableID=2} T2 on  ("T1"."film_id"="T2"."film_id")  where  ( ("T1"."category_id"=1) ) 
  
         Number of SQL I/O threads: 4
  
 NOTE: Table MYRESULTS was created in caslib CASUSER(mysasuser) with 64 rows returned.

 

Be careful, to enable pass-through or on-the-fly loading with case-sensitive databases, the FedSQL syntax might need to be adjusted to preserve case. By default, FedSQL defaults to upper-casing identifier names that are not quoted. The way to preserve a name containing lowercase/uppercase characters is by adding the double-quotes.  

 

Sounds good? In addition to the pass-through facility, FedSQL brings 2 options to control it through the FedSQL Query Planner:

 

  • disablePassThrough: disables implicit FedSQL pass-through. Referenced tables are loaded into the CAS server for processing.
  • requireFullPassThrough: stops processing the FedSQL request when implicit pass-through of the full query cannot be achieved.

 

  Example 3:

 

proc fedsql sessref=mySession _method 
            cntl=(requireFullPassThrough) ;
   create table pg_db1.myresults{options replace=true} as 
      select * from pg_db1."customer"
      where put(create_date,ddmmyy10.)='14/02/2006' ;
quit ;

 

This example illustrates the use of the FedSQL Query Planner option called requireFullPassThrough. It is useful when pushdown of the SQL statement is not possible (here the FedSQL is not ANSI-compliant and makes use of SAS functions) and you don’t want to load the whole and possibly big database table in CAS. Instead the query will stop and you will get the following message in the SAS log:

 

 Methods for full query plan
 ----------------------------
         SeqScan with qual from PG_DB1.customer 
  
 NOTE: Failure of broadcastDSSTextAndLen prevents full pass-through.
 NOTE: Full pass-through to the underlying data source was not possible. Stopping execution.

 

Finally, what do we mean by “full”-query?

 

The pass-through capability in Viya 3.3 only supports full-query pass-through. That is to say, the FedSQL query will be passed-through to the database only if the entire query can be pushed to the data source.

 

If we come back to the example 2, where we have the following WHERE clause:

 

   where film.film_id=film_category.film_id and 
         film_category.category_id=1 ;

 

The first part of the WHERE clause is a join WHERE clause. The second one is simply a filter.

 

We can think that the second WHERE clause could be passed through to the database even if FedSQL deals with multiple CASLIBs. This is a “partial”-query pass-through example and it will be supported in a future Viya realease.

 

Documentation on the FedSQL pass-through facility can be found here.

 

NB: the FedSQL Implicit Pass-Through Facility for CAS is supported with the following data sources: Hadoop, Impala, ODBC, Oracle, PostgreSQL, Teradata, Amazon Redshift, DB2 and SAP HANA. It is not supported with Microsoft SQL Server.

Comments

Perhaps not in scope of this paper, but I wonder if SAS really need to cater for two different SQL dialects (three if you care to count SPDS)?

Since roughly 99% of the existing SQL code base is PROC SQL, wouldn't be nice to merge them at som point?

Hello @LinusH

Thanks for your comment. FedSQL is the SQL implementation that has been chosen to run in CAS. For the time being, if you want to run SQL queries in CAS (in parallel, in-memory) then you must use FedSQL. PROC SQL is able to read CAS tables but will run in SAS. I'm not aware of any short-term plan to merge them.

Thanks.

and what is CAS???

Hello @ccaulkins912,

 

CAS stands for Cloud Analytic Services. You can read all about it here: https://support.sas.com/resources/papers/proceedings17/SAS0309-2017.pdf

 

See in my part of the world - my little world, some might say - CAS is the Core Accounting System for contracting operations at USCG - Coast Guard

Implicit pass-through was added to FedSQL for CAS in Viya 3.3. Limited explicit pass-through in the CONNECTION TO component of the SELECT statement FROM clause was added in SAS Viya 3.4.

 

For details, see information about each of them here:

https://go.documentation.sas.com/?cdcId=sasstudiocdc&cdcVersion=3.8&docsetId=casfedsql&docsetTarget=...

Version history
Last update:
‎05-03-2018 04:19 AM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags