Welcome to the first entry in my “Simplifying SAS Viya” series. In this post, we’ll focus on the servers available in SAS Viya, breaking down their core capabilities and explaining how they differ from traditional SAS 9.
SAS Viya is a modern, cloud-enabled analytics platform that offers high-speed processing and scalability. It’s often described with terms like “in-memory,” “massively parallel processing,” and “cloud-native architecture.” Throughout this post I will explain and give meaning to these terms.
Traditionally, SAS was installed on a single system with dedicated disk, RAM, and CPUs.
Let’s walk through an example of what happens when a program runs in SAS 9:
DATA work.cars_SUV;
SET sashelp.cars;
WHERE Type= "SUV";
RUN;
PROC MEANS data= work.cars_SUV;
VAR MPG_Highway;
RUN;
PROC FREQ data= work.cars_SUV;
TABLES Make;
RUN;
When this program runs in SAS 9:
In this small program, the data is loaded and unloaded from memory three times, leading to repeated input/output (I/O) overhead. This is manageable for small programs but becomes time-consuming when working with large datasets or complex processes.
In SAS Viya, there are two primary servers, providing options to programmers depending on the processing demands of their programs. The two servers are:
The Compute Server works similarly to the traditional SAS 9 Server, which could have been referred to as PC SAS or the SAS Workspace Server. You can run your existing SAS programs with normal SAS code in SAS Viya and it will execute on the Compute Server. It’s a great option for smaller datasets or if you want to transition existing code into SAS Viya with minimal modifications. On the Compute Server, data is read into memory and cleared from memory with each step (as explained in Traditional SAS 9: Background). Some processes are single-threaded (completing one task at a time), others are multi-threaded (tasks are being divided and worked on at the same time).
When we talk about threads we are working with a CPU. CPUs have cores which are the physical processing units that execute instructions independently. Threads are the virtual sequences of instructions that can run on a single core.
Single-threaded processes execute instructions one at a time. This can be less efficient for tasks that can be parallelized or require waiting for I/O operations.
Multi-threaded processes allow multiple parts of a program to run concurrently. So, different threads can execute different tasks simultaneously, which is more efficient. Multi-threading allows memory and files to be shared, and it can remain responsive even if part of it is blocked or performing a lengthy operation.
The CAS Server is fully multi-threaded, which is what allows programs to run faster, as tasks are being distributed.
The real power of SAS Viya comes from the CAS Server, which offers several advantages over traditional SAS processing:
Understanding the SAS Viya’s servers is key to knowing when and how to leverage the capabilities of each processing environment. The CAS Server shines when you’re working with:
By understanding when to use the Compute Server versus the CAS Server, you can optimize your code for better performance. Remember, the choice to enhance your programs to run in CAS is yours. You can still run traditional SAS code as-is using the Compute Server.
SAS Viya is deployed to a Kubernetes cluster. Typically, this is something only an admin needs to know about. However, as we learn how the CAS Server works, we refer to a controller and worker nodes, which are part of a Kubernetes cluster, so here is my very simplified explanation. A Kubernetes cluster is a set of node machines for running containerized applications. It consists of a control plane which is the “brain” of the cluster and nodes which are the worker machines. Among other things, the control plane or “controller node” has the ability to send tasks to the worker nodes.
The CAS Server is configured to run on one or more machines, each with multiple nodes. Typically, you will have one controller node and several worker nodes. When using SAS Studio in SAS Viya, the Compute Server acts as the client to the CAS Server. For data to be processed in CAS, it must be loaded into memory from a physical location. We use caslibs to connect to data source files and load tables into memory for processing in CAS.
When a program is executed in CAS, the controller node distributes in-memory data in blocks to the multiple worker nodes. The workers execute the same actions at the same time on different blocks of data, which is called parallel processing. As the worker nodes finish processing, they send the data back to the controller, and we see the results back in SAS Studio as expected.
Data that is in memory will stay in memory until it is explicitly dropped or the CAS session ends. While your data is in memory, the same table can be accessed by multiple users, avoiding additional I/O. If you want to save changes made to in-memory tables, you must explicitly save them back to a physical storage location.
In summary, SAS Viya allows you to submit traditional SAS code on the Compute Server, just as you always have. It also introduces a more powerful, flexible architecture than traditional SAS 9 through its CAS Server and in-memory, massively parallel processing capabilities. While the Compute Server offers compatibility with your existing SAS programs, the CAS Server is the game-changer for handling larger data and more demanding computations.
In Part 2 of the series, we’ll explore how to connect and work with your data in SAS Viya using caslibs.
Find more articles from SAS Global Enablement and Learning here.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.