In this series, I want to share more general information about the integration between SAS and Redshift. After loading Redshift data in CAS, saving CAS data in Redshift, let’s talk a little bit about how to send some SQL statements directly to Redshift for execution from CAS.
From a SAS Viya perspective, there is not much to do about processing Redshift data in place. SAS Viya is a data-crunching platform, and its in-memory engine must be fed with data coming from various data sources before you can exploit the SAS Viya capabilities. The data loading phase is essential in a SAS Viya project. However, there are some SQL pass-through techniques that can be helpful when you want to load the result of an optimized user-written SQL query directly into CAS.
FedSQL is the SAS SQL implementation that runs in CAS. It allows users to run their SQL queries in a scalable, threaded, and high-performance way. By default, it expects the input tables to be loaded in CAS before running the SQL query. However, if the input tables are not yet loaded in CAS, it is smart enough to automatically push the SQL query down to the database so that it is executed by the database and only the result set is loaded in CAS.
In this example, five tables are not loaded in CAS before running the FedSQL query, so the SQL query is completely offloaded to Redshift:
92 proc fedsql sessref=mysession _method ; 93 create table rs.orders as select 94 p.p_name, p.p_color, p.p_type, p.p_size, 95 s.s_nation, s.s_region, 96 c.c_nation, c.c_region, 97 d.d_month, d.d_year, 98 lo.lo_quantity, lo.lo_revenue 99 from rs.lineorder lo inner join rs.part p on lo.lo_partkey=p.p_partkey 100 inner join rs.supplier s on lo.lo_suppkey=s.s_suppkey 101 inner join rs.customer c on lo.lo_custkey=c.c_custkey 102 inner join rs.dwdate d on lo.lo_orderdate=d.d_datekey 103 ; Methods for full query plan ---------------------------- Number of Joins Performed is : 4 MergeJoin (INNER) Sort MergeJoin (INNER) Sort MergeJoin (INNER) Sort MergeJoin (INNER) Sort SeqScan from RS.LINEORDER Sort SeqScan from RS.DWDATE Sort SeqScan from RS.SUPPLIER Sort SeqScan from RS.CUSTOMER Sort SeqScan from RS.PART Offloaded SQL statement ------------------------ select "P"."p_name", "P"."p_color", "P"."p_type", "P"."p_size", "S"."s_nation", "S"."s_region", "C"."c_nation", "C"."c_region", "D"."d_month", "D"."d_year", "LO"."lo_quantity", "LO"."lo_revenue" from "public"."LINEORDER" "LO" inner join "public"."PART" "P" on ("LO"."lo_partkey"="P"."p_partkey") inner join "public"."SUPPLIER" "S" on ("LO"."lo_suppkey"="S"."s_suppkey") inner join "public"."CUSTOMER" "C" on ("LO"."lo_custkey"="C"."c_custkey") inner join "public"."DWDATE" "D" on ("LO"."lo_orderdate"="D"."d_datekey") CAS output table: RS.ORDERS NOTE: The SQL statement was fully offloaded to the underlying data source via full pass-through 104 quit ; NOTE: PROCEDURE FEDSQL used (Total process time): real time 0.79 seconds cpu time 0.01 seconds
If pass-through is not possible, the physical source tables are loaded temporarily into CAS on the fly and the request is processed in CAS (using load-on-demand capability). To be eligible for implicit pass-through:
Starting in SAS Viya 3.4, CAS and FedSQL introduced a new explicit pass-through facility. It allows to send native SQL SELECT-type syntax to a database for execution. The result set produced by the database is then loaded in CAS into memory for further processing. Here is an example of a FedSQL explicit pass-through syntax:
proc fedsql sessref=mysession _method ; create table rs.extract as select * from connection to rs ( select c_nation as customer_country, s_nation as supplier_country, sum(lo_quantity) as qty, sum(lo_revenue) as revenue from public.orders group by c_nation, s_nation ) ; ; quit ;
This article is an extract of the SAS Global Forum paper SAS® and Amazon Redshift: Overview of Current Capabilities. Thanks for reading.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.