In one of the earlier Communities articles, we reviewed how a subset of SAS procedures can be pushed down for execution in database with the help of the corresponding SAS/ACCESS Interface. SAS procedures that can be pushed down in this fashion are FREQ, MEANS, RANK, REPORT, SORT, SUMMARY, and TABULATE. The push down of these procedures is achieved via the implicit translation of the SAS procedure code into SQL.
In this blog, we will talk about a different type of a procedure push down. While the goal of the push down is the same - delegate as much processing as possible to the database that is hosting the original data source - the approach of the push down orchestration is different, and the technologies that enable this processing delegation are different as well.
This type of the PROC push down is sometimes referred as SAS In-Database Technologies-based or SAS Embedded Process-based. And it's driven by, you guessed it, SAS Embedded Process which must be deployed to a database and serve as a SAS liaison that is skilled in both, SAS and database operations.
Unlike the SAS/ACCESS -based procedures push down, which uses SQL to talk to the database, SAS In-Database Technologies-based procedures push down talks DS2 language. Once pointed to the database for execution, the SAS procedure gets translated into DS2 programs. DS2 programs are then delivered to the database and are orchestrated by SAS Embedded Process to run in parallel across the database cluster.
This push down approach is beneficial in a number of ways:
PROC TRANSPOSE is the first procedure that was selected to take advantage of this in-database technology.
Transpositions are very common in data preparation cycles and are known for their heavy CPU intensity. This made PROC TRANSPOSE a solid candidate to benefit from executing in database.
It's time to see an example of a PROC TRANSPOSE push-down. We'll use Teradata as the data platform. I also have SAS In-Database Technologies for Teradata licensed and SAS Embedded Process deployed to Teradata database.
/* Create fishdata table in Teradata */ data TDLib.fishdata; infile datalines missover; input Location & $10. Date date7. Length1 Weight1 Length2 Weight2 Length3 Weight3 Length4 Weight4; format date date7.; datalines; Cole Pond 2JUN95 31 .25 32 .3 32 .25 33 .3 Cole Pond 3JUL95 33 .32 34 .41 37 .48 32 .28 Cole Pond 4AUG95 29 .23 30 .25 34 .47 32 .3 Eagle Lake 2JUN95 32 .35 32 .25 33 .30 Eagle Lake 3JUL95 30 .20 36 .45 Eagle Lake 4AUG95 33 .30 33 .28 34 .42 ; /* Transpose fishdata table in Teradata*/ OPTIONS SQLGENERATION=DBMS; proc transpose data=TDLib.fishdata out=TDLib.fishlength1 INDB=YES LET; var length1-length4; by location; id date; run;
The notable components of this code that help its execution in database are:
A peek at the log reveals a successful in-database execution.
In SAS 9.4M4, PROC TRANSPOSE push-down capability is available for Hadoop and Teradata platforms and requires SAS In-Database Code Accelerator or SAS In-Database Technologies (which includes SAS In-Database Code Accelerator) for the corresponding platform to be licensed.
Much more details on this capability, with examples and how-tos, can be found in the white paper The Future of Transpose: How SAS® Is Rebuilding Its Foundation by Making What Is Old New Again by Scott Mebust from SAS.
If you'd like to learn more about SAS In-Database Technologies in general, check out Befriend SAS In-Database Technologies to Accelerate SAS Integration with Your Data Platform white paper.