The universe is expanding. You can tell by studying the cosmological redshift. The Cloud Computing universe is also expanding. I can tell by counting the requests for information on Amazon Web Services' Redshift. AWS is arguably the biggest player in the Cloud space. So their database, Redshift, is naturally popular.
Redshift is an MPP database designed to support reporting, analytics, dashboards, and decisioning. Like Teradata, Redshift distributes its data and processing over multiple hosts allowing it to scale for large implementations. For more details on Redshift, check out this FAQ.
For a great discussion of all of these features, check out this excellent paper by Chris Dehart and GEL expat, Jeff Bailey. The paper is behind a release but actually talks about the more recent features in its futures section. The paper's highlights include load method comparisons as well as an excellent description of SAS to Redshift Bulkload mechanism.
Want to see it in action? Check out Chris Dehart's SAS Tech Talk video:
Two features standout when discussing Redshift -- Bulk-loading and Pass-through/Push-down. Since Redshift will always exist in an AWS Cloud and your SAS server might not, it is critical that SAS can pass data to Redshift as efficiently as possible and that SAS can pass instructions to Redshift so that data is not needlessly passing from Redshift to SAS for processing.
As Jeff and Chris' paper details, SAS' bulkload mechanism works by pushing the output data to AWS S3 and then issuing an S3 COPY command. As of the m4 release, this is all done behind the scenes. Users don't have to know anything about the COPY command or S3. They simply have to specify the required bulkload options (e.g. bulkload=yes) as shown in the following example:
libname libred redshift server=rsserver db=rsdb user=myuserID pwd=myPwd port=5439; DATA libred.myclass( bulkload=yes bl_bucket=myBucket bl_key=99999 bl_secret=12345 bl_default_dir='/tmp' bl_region='us-east-1'); SET sashelp.class; RUN;
Despite the extra I/O, utilizing the S3 Copy Bulkload mechanism gives SAS impressive performance over even the already-impressive performance of native SAS/Access to Redshift. The following table compares the use-case results presented by Jeff and Chris using different load techniques:
|SAS to Redshift Load Performance|
|SAS Access to ODBC (Amazon Driver)||28 minutes|
|SAS Access to Redshift||54 seconds|
|SAS Bulkload Mechanism||12.5 seconds|
Equally as important as load performance, the capability to pass instructions from SAS to Redshift eliminates data transfers all together since data required for SAS queries/processing can stay in Redshift where it is processed. SAS can pass instructions in three different ways -- Implicit SQL Pass-through; Explicit SQL Pass-Through; And, In-Database Execution of SAS procedures.
As with most relational databases, SAS can not only pass PROC SQL code implicitly to Redshift, SAS can also send explicit Redshift commands. This allows for non-standard SQL commands and complete control of the requests being passed. In addition to writing your own SQL, SAS DI Studio can generate explicit SQL giving SAS a user interface for Explicit SQL to Redshift.
Finally, SAS can push the FREQ, RANK, REPORT, SORT, MEANS, and TABULATE procedures into Redshift for execution as SQL queries. As with all target platforms, be aware of any limitations.