Working with SAS and Hadoop: Part 3 - SAS In-Memory Analytics and SAS Viya

2 Likes

Thanks to those of you who are back for the third installment of this article series. For anyone who missed the previous posts, we’ve covered a quick overview of Hadoop and how it works with SAS technologies as well as DS2 programming with Hadoop.

Let’s dive into the final post – SAS In-Memory Analytics and SAS Viya.

SAS In-Memory Analytics: SAS In-Memory Analytics software brings SAS statistics and machine learning computations into Hadoop so that you can perform analyses in parallel quickly on large volumes of data. Just like SAS Code Accelerator for Hadoop technology, this is achieved by installing SAS analytical software components in each of the Hadoop machines where the data resides. From SAS client applications, you then send requests to execute parallel analytical processes on data distributed on the machines in Hadoop. So, once again, running the process where the data resides rather than bringing the data to the machine where SAS is executing.

When processing distributed data in parallel across multiple machines in Hadoop it is necessary for the set of distributed parallel processes to inter-communicate in order to coordinate and automatically handle the complexities involved. The set of interconnected processes is often referred to as a grid. SAS In-Memory analytics products use either what is referred to as a SAS High Performance Grid or a SAS LASR grid.

These two types of grids are similar but have some distinct differences. The SAS High Performance Grid process is a single user process. A grid process starts up within Hadoop for each individual process that an individual user requests for each distributed data source. Each process loads the data from disk storage in the Hadoop file system into memory. The SAS LASR grid, in contrast, is designed as a shared server process. A LASR Grid service is started up and serves multiple users over time. Multiple tables can be loaded once into memory and accessed multiple times by multiple users.

In addition to co-locating the analytical grid software in the Hadoop cluster, it is also possible to deploy the analytical grid on a set of machines separate from the Hadoop cluster. When data is loaded into memory in the SAS analytical grid, it is moved in parallel from the Hadoop cluster into memory in the SAS analytical grid host machines.

The table below lists the SAS In-Memory Analytics products and briefly describes the type of SAS analytic grid type they employ, the user interface, and their purpose.

Product	Analytical Grid Type		Purpose
SAS High Performance Analytics Solutions: · Statistics · Data Mining · Text Mining · Econometrics · Forecasting · Optimization	SAS High Performance Grid	SAS Procedural programming	Develop statistical data models and machine learning systems
SAS Visual Analytics	SAS LASR Grid	Web application point-and-click	Interactive generation of graphs and tables displaying analyses of data distributions and descriptive statistics
SAS Visual Statistics	SAS LASR Grid	web application point-and-click	Interactively build analytical data models and machine learning systems and generate graphs and tables displaying results.
SAS In Memory Statistics	SAS LASR Grid	SAS Procedural Programming	Develop statistical data models and machine learning systems

SAS Viya: This discussion would not be complete without mentioning that SAS Viya includes the next generation of SAS Analytical Grid software technology. SAS Viya further improves and extends the SAS LASR technology in several ways. In SAS Viya, Cloud Analytic Services (CAS) is the in-memory engine that replaces SAS LASR. CAS is scalable, meaning that it can be deployed easily on a single machine or multi-machine environments including hosted cloud environments (for instance Amazon Web Services). This allows you to scale your applications according to size of data. And, this also allows you to develop a program in a single machine environment for smaller data volumes and then run that same application in a multi-machine grid environment where you can run your processes in parallel on large amounts of distributed in-memory data. Other notable CAS capabilities include:

Independent user sessions which shields each user from performance of other uses.
The ability at the user-session level to control the number of grid machines used. This way you can use only the amount of computing resources needed for a particular application.
Fail over capability: a CAS user session can recover and continue automatically if a process fails.
Improved caching of in-memory data to disk
Support for client access via open source Python, Java, or Lua programming interfaces.

Training on SAS and Hadoop

Looking to learn more and start working with SAS technologies for Hadoop? Several training options are available.

Introduction to SAS and Hadoop will equip you with the knowledge you need to effectively implement BASE SAS and SAS Access Interface to Hadoop programming methods.
DS2 Programming Essentials for Hadoop will teach you how to program with the DS2 procedure, how to take advantage of DS2 threading capabilities, and how to execute DS2 code in the Hadoop cluster.
If you are looking for a course more focused on open source Hadoop technology with a briefer overview of SAS data management programming techniques for Hadoop, you might be interested in taking Hadoop Data Management with Hive, Pig, and SAS.
For those looking for a simple wizard driven application interface to manage Hadoop data management processes we offer the course Working with SAS Data Loader for Hadoop.
Several courses are available for users and administrators of SAS Visual Analytics. We also offer a course on SAS Visual Statistics called SAS Visual Statistics: Interactive Model Building.
For those that use SAS High Performance Analytics solutions for Hadoop, you may be interested in Predictive Modeling Using SAS High Performance Analytics Procedures and SAS Enterprise Miner High Performance Data Mining Nodes.
Users of our SAS In-Memory Statistics software can take Getting Started with SAS In-Memory Statistics and Predictive Modeling Using SAS In-Memory Statistics.

For more details on these course offerings click here. For details on the courses available for Visual Analytics follow this link. And for Hadoop distribution resources, check out our online help here.

Working with SAS and Hadoop: Part 3 - SAS In-Memory Analytics and SAS Viya

Free course: Data Literacy Essentials

Get Started