What is FedSQL? It’s SAS’ proprietary implementation of the ANSI SQL:1999 core standard. It’s a vendor-agnostic SQL dialect. Therefore, one does not have to write SQL code in data source specific SQL. Most importantly, it executes in Cloud Analytic Services (CAS) whereas PROC SQL does not. In CAS, FedSQL provides a scalable, threaded, high-performance way to run queries. These queries can perform data management transformations and manipulations by querying the data in existing CAS tables and creating new CAS tables/result sets.
FedSQL statements can be submitted to the CAS server from a SAS client or SAS Viya session using the FedSQL procedure or the fedsql.execDirect CAS action. The fedsql.execDirect action can be called from a SAS Viya CASL (CASL = Cloud Analytics Services Language) program using PROC CAS or a Python, R, or Lua program.
When FedSQL statements are executed by the CAS server, the FedSQL result set can be an in-memory CAS table or the FedSQL result set can be sent back to the client without creating a CAS table. The latter situation occurs if the CAS query contains a SELECT statement without a CREATE TABLE AS statement. You can use other CAS actions to persist the result set on the CAS server or to save the result set to an external data source for further processing.
With the upcoming release of Viya 3.5 in November 2019, several new capabilities have been added to FedSQL in CAS.
- UNION queries are supported. The new UNION set operator combines the result sets of two SELECT queries and returns unique rows from both result sets. Columns are combined by position by default. There is an optional CORRESPONDING keyword that enables you to combine columns based on column name instead. The optional BY keyword enables you to submit a column list to the CORRESPONDING keyword. A simple example is as follows:
data casuser.cars1 (keep= make model type origin);
set sashelp.cars;
run;
data casuser.cars2 (keep= make model type origin);
set sashelp.cars;
where origin = 'USA';
run;
proc fedsql sessref=mysess;
create table casuser.cars_union {options replace=true} as
select * from casuser.cars1
union
select * from casuser.cars2;
quit;
- The CAST function is supported allowing you to convert a value from one data type to another. In this example, the data types for msrp and invoice have been changed from numeric to char(8).
proc fedsql sessref=mysess;
create table casuser.cars_cast {options replace=true} as
select cast(msrp as char(8)),
cast(invoice as char(8))
from casuser.cars;
quit;
proc contents data=casuser.cars_cast; run;
- Scalability improvements involving computed columns. In SAS Viya 3.4 and earlier the fedsql.execDirect action serialized many operations on one worker. Beginning with SAS Viya 3.5, operations with computed columns are partitioned by creating temporary intermediate tables. The following operations, which were previously serialized on a one worker, can now be performed on multiple workers:
- GROUP BY aggregations where one or more group expressions are computed columns.
- Joins on columns that use different SAS formats.
- SELECT DISTINCT with computed columns.
- FULL outer joins except for those where the join is an inequality condition (t1.x <> t2.x) or an equality condition where one side of the equation contains column references from more than one table (t1.x + t2.y = 7).
- LIMIT and OFFSET operations are still serialized on one worker.
- Optimization for simple queries such as SELECT COUNT(*) FROM CASLIB.TABLE (no other columns selected, no WHERE clause, etc.). The COUNT is read from table metadata and returned rather than reading all rows in the table.
- The fedsql.execDirect action has four new instructions/parameters for the CNTL= option.
- showStages which displays details of query execution, including times for intermediate stages, along with the FedSQL query plan. This parameter returns much less detail in the log and output is much more user friendly.
- dynamicCardinality directs the FedSQL query planner to perform a cardinality estimation prior to creating a query plan.
- optimizeVarbinaryPrecision and optimizeVarcharPrecision optimizes Varbinary and Varchar precisions respectively by using a precision that’s appropriate to the actual data, rather than the declared precision for said variables.
- In SAS Viya 3.4 and earlier, the DROP TABLE statement and REPLACE= table option considered tables with global scope when enforcing name exclusivity rules and would consequently drop and replace a global table. Beginning in SAS Viya 3.5, the DROP TABLE statement and REPLACE= table option operate exclusively on CAS tables with session scope.
- FedSQL now supports the VARBINARY data type. This was not supported in previous versions of SAS Viya. One example of the use of VARBINARY is with image processing. When an image is loaded to CAS, its data type becomes VARBINARY. The observation containing the image also contains metadata columns about the image such as name, size, position, resolution, etc. The CAS table containing the image can now be used in FedSQL queries for operations such as:
- UNION – combine multiple input tables into one output table.
- JOIN – combine tables based on the matching of certain attributes, while also including VARBINARY columns in the output table.
- Consult the current documentation for additional information on the image action set.
These are just some of the major improvements made to FedSQL and the fedsql.execDirect action in SAS Viya 3.5. Details about additional enhancements can be found in the documentation once it’s available. These include:
- Sorting and Grouping in CAS
- Aggregate Processing when a GROUP BY statement is not used
- Handling of DATE, TIME, and TIMESTAMP vales in CAS
A big thank you to Andy Therber for his contributions to this article.