Simplifying SAS Viya Part 1: Choose Your Server

2 Likes

Welcome to the first entry in my “Simplifying SAS Viya” series. In this post, we’ll focus on the servers available in SAS Viya, breaking down their core capabilities and explaining how they differ from traditional SAS 9.

What Is SAS Viya?

SAS Viya is a modern, cloud-enabled analytics platform that offers high-speed processing and scalability. It’s often described with terms like “in-memory,” “massively parallel processing,” and “cloud-native architecture.” Throughout this post I will explain and give meaning to these terms.

Traditional SAS 9: Background

Traditionally, SAS was installed on a single system with dedicated disk, RAM, and CPUs.

Let’s walk through an example of what happens when a program runs in SAS 9:

DATA work.cars_SUV;  
     SET sashelp.cars;  
     WHERE Type= "SUV";  
RUN;   
 

PROC MEANS data= work.cars_SUV;  
     VAR MPG_Highway;  
RUN;   

 
PROC FREQ data= work.cars_SUV;  
     TABLES Make;  
RUN;

When this program runs in SAS 9:

The DATA step loads data from disk into memory, processes it, and writes it back to disk.

The PROC MEANS step loads the data from disk again, generates results, and clears the memory.

Finally, the PROC FREQ step also loads the data from disk, generates results, and clears the memory.

In this small program, the data is loaded and unloaded from memory three times, leading to repeated input/output (I/O) overhead. This is manageable for small programs but becomes time-consuming when working with large datasets or complex processes.

The SAS Viya Difference: A Two-Server Model

In SAS Viya, there are two primary servers, providing options to programmers depending on the processing demands of their programs. The two servers are:

SAS Compute Server
CAS (Cloud Analytic Services) Server

SAS Compute Server

The Compute Server works similarly to the traditional SAS 9 Server, which could have been referred to as PC SAS or the SAS Workspace Server. You can run your existing SAS programs with normal SAS code in SAS Viya and it will execute on the Compute Server. It’s a great option for smaller datasets or if you want to transition existing code into SAS Viya with minimal modifications. On the Compute Server, data is read into memory and cleared from memory with each step (as explained in Traditional SAS 9: Background). Some processes are single-threaded (completing one task at a time), others are multi-threaded (tasks are being divided and worked on at the same time).

Threads Explained

When we talk about threads we are working with a CPU. CPUs have cores which are the physical processing units that execute instructions independently. Threads are the virtual sequences of instructions that can run on a single core.

Single-threaded processes execute instructions one at a time. This can be less efficient for tasks that can be parallelized or require waiting for I/O operations.

Multi-threaded processes allow multiple parts of a program to run concurrently. So, different threads can execute different tasks simultaneously, which is more efficient. Multi-threading allows memory and files to be shared, and it can remain responsive even if part of it is blocked or performing a lengthy operation.

CAS Server

The CAS Server is fully multi-threaded, which is what allows programs to run faster, as tasks are being distributed.

The real power of SAS Viya comes from the CAS Server, which offers several advantages over traditional SAS processing:

Massively Parallel Processing (MPP) Environment: CAS divides large datasets into chunks and processes them across multiple nodes simultaneously. This is like having several security lanes open at an airport, speeding up the entire process.

In-Memory Analytics Engine: Instead of loading and unloading data multiple times like in SAS 9, CAS allows you to load a dataset into memory once and keep it there until your work is finished. This greatly reduces I/O, making processing faster and more efficient.

Why Does Server Choice in SAS Viya Matter?

Understanding the SAS Viya’s servers is key to knowing when and how to leverage the capabilities of each processing environment. The CAS Server shines when you’re working with:

Large datasets (over 50GB)

Programs that require multiple reads of the same data

Long-running or computationally intensive tasks

By understanding when to use the Compute Server versus the CAS Server, you can optimize your code for better performance. Remember, the choice to enhance your programs to run in CAS is yours. You can still run traditional SAS code as-is using the Compute Server.

Quick Intro to Kubernetes

SAS Viya is deployed to a Kubernetes cluster. Typically, this is something only an admin needs to know about. However, as we learn how the CAS Server works, we refer to a controller and worker nodes, which are part of a Kubernetes cluster, so here is my very simplified explanation. A Kubernetes cluster is a set of node machines for running containerized applications. It consists of a control plane which is the “brain” of the cluster and nodes which are the worker machines. Among other things, the control plane or “controller node” has the ability to send tasks to the worker nodes.

How CAS Works

The CAS Server is configured to run on one or more machines, each with multiple nodes. Typically, you will have one controller node and several worker nodes. When using SAS Studio in SAS Viya, the Compute Server acts as the client to the CAS Server. For data to be processed in CAS, it must be loaded into memory from a physical location. We use caslibs to connect to data source files and load tables into memory for processing in CAS.

When a program is executed in CAS, the controller node distributes in-memory data in blocks to the multiple worker nodes. The workers execute the same actions at the same time on different blocks of data, which is called parallel processing. As the worker nodes finish processing, they send the data back to the controller, and we see the results back in SAS Studio as expected.

Data that is in memory will stay in memory until it is explicitly dropped or the CAS session ends. While your data is in memory, the same table can be accessed by multiple users, avoiding additional I/O. If you want to save changes made to in-memory tables, you must explicitly save them back to a physical storage location.

Conclusion

In summary, SAS Viya allows you to submit traditional SAS code on the Compute Server, just as you always have. It also introduces a more powerful, flexible architecture than traditional SAS 9 through its CAS Server and in-memory, massively parallel processing capabilities. While the Compute Server offers compatibility with your existing SAS programs, the CAS Server is the game-changer for handling larger data and more demanding computations.

In Part 2 of the series, we’ll explore how to connect and work with your data in SAS Viya using caslibs.

Find more articles from SAS Global Enablement and Learning here.