BookmarkSubscribeRSS Feed

Redshift and SAS

Started ‎05-25-2017 by
Modified ‎05-26-2017 by
Views 7,871

The universe is expanding. You can tell by studying the cosmological redshift. The Cloud Computing universe is also expanding. I can tell by counting the requests for information on Amazon Web Services' Redshift.  AWS is arguably the biggest player in the Cloud space. So their database, Redshift, is naturally popular.

 

Amazon Redshift in a Nutshell

Redshift is an MPP database designed to support reporting, analytics, dashboards, and decisioning. Like Teradata, Redshift distributes its data and processing over multiple hosts allowing it to scale for large implementations. For more details on Redshift, check out this FAQ.

 

SAS integration with Redshift

As of SAS 9.4m4, SAS offers the following Redshift integration:

 

  • A Redshift Libname Engine
  • SQL Pass-Through
  • Bulk Loading
  • In-Database Execution of FREQ, MEANS, RANK, REPORT, SORT, and TABULATE

For a great discussion of all of these features, check out this excellent paper by Chris Dehart and GEL expat, Jeff Bailey. The paper is behind a release but actually talks about the more recent features in its futures section. The paper's highlights include load method comparisons as well as an excellent description of SAS to Redshift Bulkload mechanism.

 

Want to see it in action? Check out Chris Dehart's SAS Tech Talk video:

 

 

Two features standout when discussing Redshift -- Bulk-loading and Pass-through/Push-down. Since Redshift will always exist in an AWS Cloud and your SAS server might not, it is critical that SAS can pass data to Redshift as efficiently as possible and that SAS can pass instructions to Redshift so that data is not needlessly passing from Redshift to SAS for processing.

 

Bulkload

As Jeff and Chris' paper details, SAS' bulkload mechanism works by pushing the output data to AWS S3 and then issuing an S3 COPY command. As of the m4 release, this is all done behind the scenes. Users don't have to know anything about the COPY command or S3. They simply have to specify the required bulkload options (e.g. bulkload=yes) as shown in the following example:

 

libname libred redshift server=rsserver db=rsdb user=myuserID pwd=myPwd port=5439;

DATA libred.myclass(
        bulkload=yes   bl_bucket=myBucket   bl_key=99999   
        bl_secret=12345   bl_default_dir='/tmp'   bl_region='us-east-1');
SET sashelp.class;
RUN;

 

Despite the extra I/O, utilizing the S3 Copy Bulkload mechanism gives SAS impressive performance over even the already-impressive performance of native SAS/Access to Redshift. The following table compares the use-case results presented by Jeff and Chris using different load techniques:

 

SAS to Redshift Load Performance
SAS Access to ODBC (Amazon Driver) 28 minutes
SAS Access to Redshift 54 seconds
SAS Bulkload Mechanism 12.5 seconds

 

Pass-Through / Push-Down

Equally as important as load performance, the capability to pass instructions from SAS to Redshift eliminates data transfers all together since data required for SAS queries/processing can stay in Redshift where it is processed. SAS can pass instructions in three different ways -- Implicit SQL Pass-through; Explicit SQL Pass-Through; And, In-Database Execution of SAS procedures.

 

Implicit Pass-Through

As of SAS 9.4m4, SAS can pass 46 SQL functions as well as most joins to Redshift.

 

Explicit Pass-Through

As with most relational databases, SAS can not only pass PROC SQL code implicitly to Redshift, SAS can also send explicit Redshift commands. This allows for non-standard SQL commands and complete control of the requests being passed. In addition to writing your own SQL, SAS DI Studio can generate explicit SQL giving SAS a user interface for Explicit SQL to Redshift.

 

In-Database Procedures

Finally, SAS can push the FREQ, RANK, REPORT, SORT, MEANS, and TABULATE procedures into Redshift for execution as SQL queries. As with all target platforms, be aware of any limitations.

Version history
Last update:
‎05-26-2017 11:13 AM
Updated by:

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags