We’re smarter together. Learn from this collection of community knowledge and add your expertise.

SAS In-Database Technologies – PROC TRANSPOSE push-down

by SAS Employee Tatyana on ‎04-21-2017 03:06 PM - edited on ‎04-24-2017 10:06 AM by Community Manager (2,061 Views)

In one of the earlier Communities articles, we reviewed how a subset of SAS procedures can be pushed down for execution in database with the help of the corresponding SAS/ACCESS Interface. SAS procedures that can be pushed down in this fashion are FREQ, MEANS, RANK, REPORT, SORT, SUMMARY, and TABULATE. The push down of these procedures is achieved via the implicit translation of the SAS procedure code into SQL.

 

In this blog, we will talk about a different type of a procedure push down. While the goal of the push down is the same - delegate as much processing as possible to the database that is hosting the original data source - the approach of the push down orchestration is different, and the technologies that enable this processing delegation are different as well.

 

This type of the PROC push down is sometimes referred as SAS In-Database Technologies-based or SAS Embedded Process-based. And it's driven by, you guessed it, SAS Embedded Process which must be deployed to a database and serve as a SAS liaison that is skilled in both, SAS and database operations.

 

Unlike the SAS/ACCESS -based procedures push down, which uses SQL to talk to the database, SAS In-Database Technologies-based procedures push down talks DS2 language. Once pointed to the database for execution, the SAS procedure gets translated into DS2 programs. DS2 programs are then delivered to the database and are orchestrated by SAS Embedded Process to run in parallel across the database cluster.

 

 

This push down approach is beneficial in a number of ways:

  1. DS2 language supports threaded programming. When your DS2 program is designed to keep data manipulations and calculations within the thread part of the program, SAS Embedded Process can channel these threads to execute in parallel across the database.
  2. DS2 is a flexible and powerful language that supports modern programming constructs, rich set of data types, while staying SAS-friendly, i.e. it talks SAS functions and operates SAS concepts effectively. It provides a lot of flexibility around custom programming and can serve many purposes. This makes translation of the most intricate SAS procedures into DS2 format possible.
  3. This approach aims to provide complete SAS program execution in database with no post-processing in SAS and no data movement outside of the database for any portions of the program execution. This goal is achievable if requirements for using SAS In-Database Technologies are met for your SAS and data platform environments, and your SAS programs are complying with the basic in-database principles (for example, your input and output data sources should both reside in the same database).

 

PROC TRANSPOSE is the first procedure that was selected to take advantage of this in-database technology.

Transpositions are very common in data preparation cycles and are known for their heavy CPU intensity. This made PROC TRANSPOSE a solid candidate to benefit from executing in database.

 

It's time to see an example of a PROC TRANSPOSE push-down. We'll use Teradata as the data platform. I also have SAS In-Database Technologies for Teradata licensed and SAS Embedded Process deployed to Teradata database.   

 

/* Create fishdata table in Teradata */

data TDLib.fishdata;
   infile datalines missover;
   input Location & $10. Date date7.
         Length1 Weight1 Length2 Weight2 Length3 Weight3
         Length4 Weight4;
   format date date7.;
   datalines;
Cole Pond   2JUN95 31 .25 32 .3  32 .25 33 .3
Cole Pond   3JUL95 33 .32 34 .41 37 .48 32 .28
Cole Pond   4AUG95 29 .23 30 .25 34 .47 32 .3
Eagle Lake  2JUN95 32 .35 32 .25 33 .30
Eagle Lake  3JUL95 30 .20 36 .45
Eagle Lake  4AUG95 33 .30 33 .28 34 .42
;


/* Transpose fishdata table in Teradata*/

OPTIONS SQLGENERATION=DBMS;

proc transpose data=TDLib.fishdata
     out=TDLib.fishlength1 INDB=YES LET;
   var length1-length4;
   by location;
   id date;
run;

 

 

The notable components of this code that help its execution in database are:

  • Presence of SQLGENERATION=DBMS system option
  • Source table is in Teradata
  • Target table is in Teradata
  • Presence of INDB=YES option
  • Presence of a BY variable. The BY variable directs which groups of observations to use for parallelization.
  • Presence of an ID variable to help map input rows to output columns.

 

A peek at the log reveals a successful in-database execution.

 

PROC TRANSPOSE push-down log.jpg

 

In SAS 9.4M4, PROC TRANSPOSE push-down capability is available for Hadoop and Teradata platforms and requires SAS In-Database Code Accelerator or SAS In-Database Technologies (which includes SAS In-Database Code Accelerator) for the corresponding platform to be licensed.

 

Much more details on this capability, with examples and how-tos, can be found in the white paper The Future of Transpose: How SAS® Is Rebuilding Its Foundation by Making What Is Old New Again by Scott Mebust from SAS.

 

If you'd like to learn more about SAS In-Database Technologies in general, check out Befriend SAS In-Database Technologies to Accelerate SAS Integration with Your Data Platform white paper.

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.