We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Working with SAS and Hadoop: Part 3 - SAS In-Memory Analytics and SAS Viya

by SAS Employee DavidGhan 2 weeks ago - edited 2 weeks ago by Community Manager (1,214 Views)

Thanks to those of you who are back for the third installment of this article series. For anyone who missed the previous posts, we’ve covered a quick overview of Hadoop and how it works with SAS technologies  as well as DS2 programming with Hadoop.

 

Let’s dive into the final post – SAS In-Memory Analytics and SAS Viya.

SAS In-Memory Analytics: SAS In-Memory Analytics software brings SAS statistics and machine learning computations into Hadoop so that you can perform analyses in parallel quickly on large volumes of data. Just like SAS Code Accelerator for Hadoop technology, this is achieved by installing SAS analytical software components in each of the Hadoop machines where the data resides. From SAS client applications, you then send requests to execute parallel analytical processes on data distributed on the machines in Hadoop. So, once again, running the process where the data resides rather than bringing the data to the machine where SAS is executing.

 

When processing distributed data in parallel across multiple machines in Hadoop it is necessary for the set of distributed parallel processes to inter-communicate in order to coordinate and automatically handle the complexities involved. The set of interconnected processes is often referred to as a grid. SAS In-Memory analytics products use either what is referred to as a SAS High Performance Grid or a SAS LASR grid.

 

These two types of grids are similar but have some distinct differences. The SAS High Performance Grid process is a single user process. A grid process starts up within Hadoop for each individual process that an individual user requests for each distributed data source. Each process loads the data from disk storage in the Hadoop file system into memory. The SAS LASR grid, in contrast, is designed as a shared server process. A LASR Grid service is started up and serves multiple users over time. Multiple tables can be loaded once into memory and accessed multiple times by multiple users.

 

In addition to co-locating the analytical grid software in the Hadoop cluster, it is also possible to deploy the analytical grid on a set of machines separate from the Hadoop cluster. When data is loaded into memory in the SAS analytical grid, it is moved in parallel from the Hadoop cluster into memory in the SAS analytical grid host machines.

 

The table below lists the SAS In-Memory Analytics products and briefly describes the type of SAS analytic grid type they employ, the user interface, and their purpose.

 

Product

Analytical Grid Type

 

Purpose

SAS High Performance Analytics Solutions:

·   Statistics

·   Data Mining

·   Text Mining

·   Econometrics

·   Forecasting

·   Optimization

SAS High Performance Grid

SAS Procedural programming

Develop statistical data models and machine learning systems

SAS Visual Analytics

SAS LASR Grid

Web application point-and-click

Interactive generation of graphs and tables displaying analyses of data distributions and descriptive statistics

SAS Visual Statistics

SAS LASR Grid

web application point-and-click

Interactively build analytical data models and machine learning systems and generate graphs and tables displaying results.

SAS In Memory Statistics

SAS LASR Grid

SAS Procedural Programming

Develop statistical data models and machine learning systems

 

SAS Viya: This discussion would not be complete without mentioning that SAS Viya includes the next generation of SAS Analytical Grid software technology. SAS Viya further improves and extends the SAS LASR technology in several ways. In SAS Viya, Cloud Analytic Services (CAS) is the in-memory engine that replaces SAS LASR. CAS is scalable, meaning that it can be deployed easily on a single machine or multi-machine environments including hosted cloud environments (for instance Amazon Web Services). This allows you to scale your applications according to size of data. And, this also allows you to develop a program in a single machine environment for smaller data volumes and then run that same application in a multi-machine grid environment where you can run your processes in parallel on large amounts of distributed in-memory data. Other notable CAS capabilities include:

  • Independent user sessions which shields each user from performance of other uses.
  • The ability at the user-session level to control the number of grid machines used. This way you can use only the amount of computing resources needed for a particular application.
  • Fail over capability: a CAS user session can recover and continue automatically if a process fails.
  • Improved caching of in-memory data to disk
  • Support for client access via open source Python, Java, or Lua programming interfaces.

Training on SAS and Hadoop

Looking to learn more and start working with SAS technologies for Hadoop? Several training options are available.

For more details on these course offerings click here. For details on the courses available for Visual Analytics follow this link. And for Hadoop distribution resources, check out our online help here

 

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.