Understanding SAS: The Different Processing Engines

3 Likes

Let's look at the different processing engines in the SAS portfolio that perform the joins, aggregations, lookups, analytics, and other data processing behind the code and visual interfaces. Of course, "engine" is also a loaded term so here we'll take it in the broadest sense.

.

SAS 9

The term, "SAS 9," is generally used to refer to Base SAS and its many extensions, SAS/CONNECT, SAS/ACCESS, SAS/STAT, IML, SAS/GRAPH, etc.. All Together, these components are formally known as the SAS Compute Server and form one of the most used data processing and analytics platforms in the world as well as the back-end for many of SAS' visual interfaces including:

The SAS processing engine has the following distinguishing characteristics:

SMP/Single Threaded Engine with Multi-Engine Scaling	Many of SAS PROCs support multi-threading, however, certain analyses concerned with row order will single thread. To improve vertical scaling as well as offer horizontal scaling SAS/CONNECT and/or SAS Grid can spin up additional SAS 9 instances to spread the load.
Row Processing, SQL, PROCs	While SAS does support two implementations of SQL, PROC SQL and PROC FEDSQL, one of SAS' strengths is its unique and flexible implementation of row processing (DATA Step) as well as its library of data transformation and analysis procedures (PROCs).
Batch, Web Service, Back-End Service, ...	While originally written as a batch engine, SAS has been extended to support every imaginable usage pattern including web services, client server, and even for transaction processing. Regardless of how it's called, its strengths are in bulk processing.
Some Dynamic I/O Capabilities	While SAS is primarily designed to read/write static data like files and database tables, it also has some abilities to handle dynamic sources/targets like pipes, message queues, sockets, and more.
Big Data Capable	With or without its Grid extensions, the SAS 9 engine can handle truly massive data loads

CAS

CAS (Cloud Analytic Server) is the "next generation" SAS processing engine. While CAS functionality already goes beyond SAS 9 in many ways (image processing as an example), CAS is also meant to offer all the functionality of SAS 9 and more but on a modern, open, high performance, multi-machine, massively parallel processing architecture. CAS is multi-threaded by design and scales processing over multiple machines automatically. Designed to be cloud native, CAS can add or remove resources as needed. CAS forms the back-end for all of the Viya applications including:

The CAS processing engine has the following distinguishing characteristics:

MPP Engine	CAS is massively parallel by design. It automatically scales both vertically and horizontally to optimally utilize hardware resources when solving analytics and performing data transformation.
SAS + more	Built on a paradigm of fine grained "CAS Actions" that do single, specific operations like aggregate data or run a regression, CAS offers most of the functionality of the SAS engine plus more. Where CAS does not offer the specific functionality that SAS does, there is generally a way to get the equivalent result with CAS. Viya also includes a SAS9 compute server to complement CAS where necessary. CAS offers a massively parallel version of DATA Step, its own library of procedures, as well as a programming language, CASL.
Back-End Service	Unlike SAS, CAS was built from the ground-up as a back-end Viya service. As such, it offers a more robust platform for integration with client applications. CAS offers numerous APIs that allow for integration of CAS processing into SAS applications, web applications, java, etc..
Static I/O	While CAS continues to get enhancements like pass-through SQL and SingleStore integration which enhance its dynamic I/O capabilities, CAS' data connector model offers less dynamic I/O options than SAS. CAS' usage model is generally: 1. Load data. 2. Process data.
Big Data Focused	With its massively parallel design, elastic capabilities, and optimized processing algorithms, CAS is truly a platform for big data analytics and data transformation.

ESP

SAS Event Stream Processing transforms raw event streams (e.g. stock trades) into useful output event streams (e.g. stocks you should buy) in real time as the events come in. ESP is not meant to process a big data set and then return a result. It constantly reads input streams and streams results as it processes. So, while it can handle big data, it does so record by record (event by event).

The ESP processing engine has the following distinguishing characteristics:

SMP Engine with multi-engine scaling	ESP is multi-threaded at its "project" (~program) level and an ESP server can run multiple projects. Thus vertical scaling is possible at the project level, across multiple projects, or by running multiple ESP servers. Horizontal scaling is possible using the ESP router or the ESP Cluster Manager to distribute events to different ESP engines on different machines. ESP can even be deployed en mass on edge devices.
Transformation and Analytics Windows	ESP offers a rich set of "windows" that offer everything from basic computations to pattern analysis, geofencing, and high-end analytics.
Back-End Service	ESP can either be deployed as a stand-alone server or via containers. ESP can also be called from the command line but this is mostly to support development and testing.
Dynamic I/O	ESP is designed specifically to read dynamic sources and write dynamic outputs via a rich set of connectors and adapters. However, for maximum integration, ESP does interface with some static sources and targets like HDFS and SASHDAT.
Low Latency, High Throughput	While SAS and CAS are primarily focused on processing static inputs and producing static results, ESP is focused on reading, processing, and streaming results as fast as possible as input events stream in.

.

MAS

When deployed outside of ESP, the SAS Micro Analytic Service is a web application that hosts REST predictive modeling and decisioning web services, called "MAS modules." As REST services, MAS modules can be easily integrated into web applications. For example, you might create a module to calculate a customer's life time value and integrate it into a call center application. So, while SAS and CAS are focused on bulk processing and ESP is focused on streaming, MAS is focused on returning small result sets from small input sets (e.g. LTV for a customer, weather for a zip code, etc.) quickly.

The MAS processing engine has the following distinguishing characteristics:

SMP Engine	MAS is multi-threaded by design and scales vertically to handle numerous calls in parallel. Horizontal scaling is possible by deploying mulitple MAS services while modules can be moved between MAS instances via the Viya transfer service.
DS2 + Python with hierarchical data support	The MAS offers thread programming via SAS' Data Step 2 language (PROC DS2) as well as Python for high-performance transaction processing in real time. While SAS and ESP both contain mechanisms to parse and construct hierarchical data structures, MAS contains considerable capabilities for reading and writing the hierarchical data that is native to the REST paradigm.
Back-End Service	The MAS is a Viya micro service.
POST request and response	MAS is designed for real time I/O but in a different paradigm than ESP. Instead of inputs streaming in and outputs streaming out, Requests messages are posted to the MAS which returns response messages.
Low Latency "Transactions"	Like ESP, MAS is focused on returning results in real time. However, instead of reading an event stream, MAS responds to individual requests with individual responses.

All the Rest

Do you think that's a lot? Well there's more. We haven't even discussed DM Server, In-Database, Federation Server, LASR, and others. Data Management server was gained through the acquisition of DataFlux and has many of the same features as SAS as well as its own distinguishing functionality. Federation Server presents multiple data sources as a single seamless data model while adding an abstraction layer for additional security and obfuscation. In-database embeds SAS functionality into or alongside 3rd party data servers via a few different mechanisms to minimize data movement. LASR is, of course, the predecessor to CAS in many ways.

The Broader Picture

All of these engines transform data -- turning observations into statistics, blending disparate data sources into integrated views, etc. -- but they are each focused on specific use cases. All together they offer complete data transformation and analytics coverage from the desktop level to big data, from real time streaming to real time services. .

SAS Communities Library