BookmarkSubscribeRSS Feed

SAS & Amazon Redshift Series: Process Redshift data in place from CAS

Started ‎06-05-2020 by
Modified ‎06-05-2020 by
Views 1,891

In this series, I want to share more general information about the integration between SAS and Redshift. After loading Redshift data in CAS, saving CAS data in Redshift, let’s talk a little bit about how to send some SQL statements directly to Redshift for execution from CAS.

From a SAS Viya perspective, there is not much to do about processing Redshift data in place. SAS Viya is a data-crunching platform, and its in-memory engine must be fed with data coming from various data sources before you can exploit the SAS Viya capabilities. The data loading phase is essential in a SAS Viya project. However, there are some SQL pass-through techniques that can be helpful when you want to load the result of an optimized user-written SQL query directly into CAS.

 

 

FedSQL Implicit Pass-Through Facility

 

FedSQL is the SAS SQL implementation that runs in CAS. It allows users to run their SQL queries in a scalable, threaded, and high-performance way. By default, it expects the input tables to be loaded in CAS before running the SQL query. However, if the input tables are not yet loaded in CAS, it is smart enough to automatically push the SQL query down to the database so that it is executed by the database and only the result set is loaded in CAS.

In this example, five tables are not loaded in CAS before running the FedSQL query, so the SQL query is completely offloaded to Redshift:

 

92   proc fedsql sessref=mysession _method ;
93      create table rs.orders as select
94      p.p_name, p.p_color, p.p_type, p.p_size,
95      s.s_nation, s.s_region,
96      c.c_nation, c.c_region,
97      d.d_month, d.d_year,
98      lo.lo_quantity, lo.lo_revenue
99      from rs.lineorder lo inner join rs.part p on lo.lo_partkey=p.p_partkey
100        inner join rs.supplier s on lo.lo_suppkey=s.s_suppkey
101           inner join rs.customer c on lo.lo_custkey=c.c_custkey
102              inner join rs.dwdate d on lo.lo_orderdate=d.d_datekey
103     ;
 
Methods for full query plan
----------------------------
Number of Joins Performed is : 4
        MergeJoin (INNER) 
          Sort 
            MergeJoin (INNER) 
              Sort 
                MergeJoin (INNER) 
                  Sort 
                    MergeJoin (INNER) 
                      Sort 
                        SeqScan from RS.LINEORDER 
                      Sort 
                        SeqScan from RS.DWDATE 
                  Sort 
                    SeqScan from RS.SUPPLIER 
              Sort 
                SeqScan from RS.CUSTOMER 
          Sort 
            SeqScan from RS.PART 
 
Offloaded SQL statement
------------------------
 
        select "P"."p_name", "P"."p_color", "P"."p_type", "P"."p_size", "S"."s_nation", "S"."s_region", "C"."c_nation", 
"C"."c_region", "D"."d_month", "D"."d_year", "LO"."lo_quantity", "LO"."lo_revenue" from "public"."LINEORDER" "LO" inner join 
"public"."PART" "P" on  ("LO"."lo_partkey"="P"."p_partkey")  inner join "public"."SUPPLIER" "S" on  
("LO"."lo_suppkey"="S"."s_suppkey")  inner join "public"."CUSTOMER" "C" on  ("LO"."lo_custkey"="C"."c_custkey")  inner join 
"public"."DWDATE" "D" on  ("LO"."lo_orderdate"="D"."d_datekey") 
 
        CAS output table: RS.ORDERS
 
NOTE: The SQL statement was fully offloaded to the underlying data source via full pass-through
104  quit ;
NOTE: PROCEDURE FEDSQL used (Total process time):
      real time           0.79 seconds
      cpu time            0.01 seconds

 

If pass-through is not possible, the physical source tables are loaded temporarily into CAS on the fly and the request is processed in CAS (using load-on-demand capability). To be eligible for implicit pass-through:

  • All of the tables in the FedSQL request must exist in the same CASLIB.
  • The tables cannot already have been loaded into CAS. If one of the tables is already loaded in CAS, the remaining tables are temporarily loaded into CAS on demand.

 

 

FedSQL Explicit Pass-Through Facility

 

Starting in SAS Viya 3.4, CAS and FedSQL introduced a new explicit pass-through facility. It allows to send native SQL SELECT-type syntax to a database for execution. The result set produced by the database is then loaded in CAS into memory for further processing. Here is an example of a FedSQL explicit pass-through syntax:

 

proc fedsql sessref=mysession _method ;
   create table rs.extract as
   select * from connection to rs
      (
         select c_nation as customer_country, s_nation as supplier_country,
                sum(lo_quantity) as qty, sum(lo_revenue) as revenue
         from public.orders
         group by c_nation, s_nation
      ) ;
   ;
quit ;

 

This article is an extract of the SAS Global Forum paper SAS® and Amazon Redshift: Overview of Current Capabilities. Thanks for reading.

Version history
Last update:
‎06-05-2020 05:11 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags