Deploying SAS Viya on more servers

5 Likes

SAS software has always been very flexible to meet the needs of different workloads, performance requirements, user expectations, and more. To do so, SAS Viya is flexible to work with deployments ranging from one server to hundreds. Determining the actual number of host machines for the SAS solution is not always easy. In my previous post, we looked at the possibilities when forced to fit Viya within 1, 2, or 3 host machines.

What we learned is that while SAS Viya can scale down to run in that limited environment, we must necessarily give up some functionality to make it all fit. This is a valid choice - but one we must ensure the customer understands properly for best result.

In this post, let's tackle this same concept from a different angle - this time, let's look at a large-scale implementation where hosts provide not just scalability and availability but are tuned specifically for the specialized software roles employed.

The cart is still before the horse

It's time to reiterate that we should not simply plan a Viya deployment based on an arbitrary number of host machines, regardless if they're few or many. While we can get to a functional deployment that way, it's not going to provide the ideal experience.

Instead, discuss all aspects of requirements and expectations with your customer. And, of course, get a sizing for your customer’s solution performed by the SAS Enterprise Excellence Center. The EEC will ask the questions to determine the workloads involved and return with a recommendation for hardware specifying CPU and RAM to meet typical performance expectations. The shape that the hardware actually takes based on those recommendations as well as other considerations of the business is then something we will need to work with.

With these formalities addressed, then let's look at what we have to work with.

Go big or go home

Viya - and CAS in particular - are designed to accommodate massive scalaility - running across many hosts as well as enabling increasing compute capacity easily by adding even more hosts. A lot of machines implies significant investment in terms of hardware costs, ongoing operations, administration, and even moreso with software licensing, user training, data management, and so much more.

When customers make that kind of signficant investment, they want to protect it. And so oft times, there will be requirements to ensure the ongoing availability of the system to minimize unexpected outages. High availability considerations therefore go hand-in-hand with scalability considerations.

Furthermore, SAS Viya is comprised of many disparate software technologies. Compare CAS with its massively parallel processing which is highly dependent on all working data available in RAM with the SAS Programming Runtime Environment which uses the more classic model of disk-based data storage and access. Both are considered computation engines, and yet they function very differently. Tuning a host ideally for one may mean that the other won't run as efficiently as it could. Fortunately, the Viya architecture allows us to separate these computation engines to different hosts - if we choose. And we can extend that concept to other aspects of Viya as well. If you're familiar with multi-tier deployments of SAS 9, this is a similar concept applied to new technologies.

Scenario: Specialize Everything ⇒ 12 or more hosts

Viya offers the ability to deploy the SAS software in preset groupings referred to in Ansible technology as host groups. This allows us to break up the software deployment across a number of host machines. Understanding which software components populate each host group then is necessary to devise a deployment of Viya where hosts can be tuned especially for the software they will run.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

In this scenario, we deploy the SAS Viya software to separate hosts which can be optimized for their specific tasks.

▷ 3 hosts for SAS Viya Stateful Services

[Host groups: consul, httpproxy, pgpoolc, rabbitmq, sasdatasvc]

Why 3 hosts? The SAS Configuration Server software is a specially packaged version of Consul from Hashicorp. Consul is designed to implement HA through a consensus methodology which relies on an odd number of hosts (that is 1, 3, 5, not 2, 4, 6) to prevent the split-brain problem. The same concept also applies to RabbitMQ, which is the technology behind SAS Message Broker.

Deployment: Place the Consul and RabbitMQ host groups on all 3 hosts. The other stateful services need only be placed on any 2 of the hosts for their improved availability. So place sasdatasvc on 2 hosts and rabbitmq on 2 hosts (with only 1 overlap). The pgpoolc software currently cannot be clustered yet (yes, it's still a single point of failure - to be addressed in Viya 3.5). Place pgpoolc on the host without sasdatasvc instances.

▷ 2 hosts for SAS Microservices

[Host groups: AdminServices, CASServices, ComputeServices, CoreServices, DataServices, HomeServices, ReportServices, ThemeServices, configuratn, and many more]

In general, the SAS microservices are mostly built using Spring Boot technology running on Java. This is not required, but common for Viya right now. Future microservices may be built in Go or any other HTTP RESTful friendly technology.

Why 2 hosts? To ensure availability in case one physical host goes down.

Deployment: Place all microservice host groups on both hosts.

▷ 2 hosts for SAS Programming Runtime Environment

[Host groups: Operations, ComputeServer, programming]

The SPRE provides a runtime environment for execution of classic SAS program code.

Why 2 hosts? To ensure availability in case one physical host goes down.

Deployment: Place all SPRE host groups on both hosts - with the exception of the operations microservice. It's currently not clusterable (again, single point of failure) and can only deploy to a single host.. Add more hosts if needed at time of initial deployment.

▷ 5 hosts for SAS Cloud Analytic Services

[Host groups: sas-casserver-primary, sas-casserver-secondary, sas-casserver-worker]

CAS is our flagship product for massively scalable processing of huge-volumes of in-memory data.

Why 5 hosts? We need 2 hosts to provide improved availability of the CAS controller role (primary and secondary). While MPP CAS will function with a single worker, that's inefficient and doesn't provide any worker failover. Two workers gives us that minimum redundancy - however, in my opinion, we should always have more worker hosts than controller hosts - so 3 workers then. Add more if/when needed.

Deployment: Place each CAS Controller on separate physical hosts. Place each CAS Worker on a separate physical host. Add more (or remove) hosts at any time.

12+ Hosts Points of interest

Each group of hosts (stateful, microservices, SPRE, CAS) can be optimized for their specialized workloads:
- CAS: network, memory, and CPU intensive
- SPRE: disk and CPU intensive
- Microservices: memory and CPU intensive
- Stateful: network, disk, and CPU intensive
If one Consul server process goes down, then any microservices which are placed on the same host machine are automatically de-registered even if they're running okay. So separating Consul from all microservices improves the effective availability of the microservices even more.
Keep in mind that the SPRE is typically licensed to run on the same number of CPU cores as allocated for CAS. If CAS is licensed for 100 cores, then the SPRE can run on another 100 cores.
The CAS 2nd Controller host does not count towards the number of CPU cores utilized by CAS (until it takes over primary role).

Scenario: Let's Compromise ⇒ 8 or more hosts

Perhaps your customer doesn't need that much specialization and optimization of hosts. Maybe they don't plan to use the SPRE very much. Or it may be that they're new to using Viya and understand needing some specialization as well as accommodating scalability and availability, but want to keep things conceptually simplified. So let's compromise.

In this scenario, we've placed infrastructure services (stateful and microservices) together on one set of hosts along with the computational services (SPRE and CAS) on another set of hosts.

▷ 3 hosts for infrastructure services

This is a very common deployment for the Viya infrastructure services.

Deployment: Place the Consul host group on all 3 hosts. The other infrastructure services need only be placed on any 2 of the hosts for their improved availability. For the stateful services, follow the guidance shown above for 12 hosts. For the microservices, try to spread them evenly based on their memory-usage.

▷ 5 hosts for SAS Cloud Analytic Services and SAS Programming Runtime Environment

Both CAS and the SPRE are considered runtime environments. But they work very differently in actual operations.

Deployment: Place the SPRE components on the CAS Controller hosts. The rationale here is that the CAS Controller hosts are not typically used as heavily as the CAS Workers. Assuming identical host sizes between controllers and workers, then the free overhead might be sufficient room for the SPRE.

8+ Hosts Points of Interest

This approach is a compromise allowing Viya to run on fewer hosts, but with more potential conflict as services compete for resources.
If this kind of compromise is sufficient to meet your customer's requirements, then great. Else look at where changes must be made.
To ensure uniform efficiency, we try to keep the CAS Workers running on similar hardware with similar workloads. That means that not just the hardware (CPU, RAM, disk, etc.) is the same for all workers, but also the OS, background processes, and anything else as well. So avoid placing the SPRE on CAS Worker hosts. Even if you place the SPRE on all of the CAS Workers, the actual SPRE workload will vary significantly with each user, each job, etc.

Scenario: Constrained Can Work ⇒ 5 or more hosts

Your customer wants a scalable CAS deployment with rudimentary availability improvements. And they don't expect a large number of users as much as they expect a few users to perform work on large volumes of data. Then we can tackle that, too.

▷ 5 hosts for CAS, SPRE, and all of the Viya infrastructure

Running everything together - but with some careful placement decisions to smooth out the kinks.

Deployment: Place two instances of the Viya infrastructure services on the same hosts alongside the SPRE and the CAS Controllers - except for Consul. In a crazy twist, let's place the Consul host group on 3 of the CAS Workers.

5+ Hosts Points of Interest

This approach is best for a sites with only a few users who perform heavy work in CAS, but which don't stress the CAS controllers with tasks.
The controller hosts are overloaded even further with more work as well as RAM-hungry microservices. Ensure those machines are beefy enough with enough resources to do all of that.
The really unusual bit here is that idea of placing Consul on 3 of the CAS Workers. The idea is that Consul is relatively lightweight - not consuming much RAM, CPU, or network in normal operations. Placing all 3 Consul Servers on the CAS Workers keeps those worker hosts similar in terms of workload. And with Consul separate from the microservices, it helps improve their effective availability slightly as well.

Which scenario is right for me?

To be frank, I don't know. There's a good chance none of those shown here or in my previous post are exactly right for your customer's requirements and expectations. The point of these posts is to convey some of the architectural concepts behind different deployment decisions so that you can work with your customer to design your own scenario.

Even within a single customer project, you may need to make use of multiple scenarios at the same time. Production environments vs. Dev/Test environments. Or environments optimized for data and analytic processing as compared to environments which specialize in report delivery and consumption. Throw in ESP, MAS, other SAS solutions, and the possibilities are nigh endless.

Coordination of SAS Viya service operations

Contrary to popular belief, SAS Viya does not really handle random startup or shutdown order of its constituent services very well. When everything is installed on a single host machine, then the sas-viya-all-services script will correctly handle startup and shutdown order. That specific use-case works so well, that the default is to configure the operating system to automatically start Viya services at host startup.

But if you distribute the Viya software components across multiple host machines, then there's a real problem. First is that the sas-viya-all-services script does not talk across machines. So it's not for "all services on all machines", but for "all services on this machine". Furthermore, there are some dependencies between Viya software services - some weak (they'll figure it out on their own) and some very strong (startup order matters! Else it just breaks). So we need to disable the automatic startup of Viya services on those hosts and then work to ensure each of the Viya services are started in proper order to meet those dependencies.

To help ensure proper startup and shutdown order of SAS Viya services, we recommend implementing the VIRK's Viya Multi-Machine Services Utilities Playbooks. In that Github repository, you can find a set of playbooks to start or stop the SAS Viya services gracefully across the 1 - n machines that are identified in the inventory.ini file. Share and enjoy!

Decision time

When deploying Viya, some decisions must be made at the time of initial deployment. For example, if your customer might want multi-tenancy later, then you must decide whether to enable multi-tenancy right now at initial deployment -or- deploy Viya as single-tenant now, but then re-deploy from scratch as multi-tenant later. Some aspects of Viya infrastructure clustering have similar considerations - what you decide at initial deployment can have an impact later.

Some quick examples of Viya software which needs clustering decisions made at initial deployment:

Viya service:	Decide now or later:	Notes:
RabbitMQ	Now	Clustering RabbitMQ after the initial deployment is a manual process with constraints
SAS Cloud Analytic Services	Later	Scalable after initial deployment, even from SMP to MPP
Apache HTTP Server	Later	Remember running multiple httpproxy requires a 3rd-party load balancer, not provided with Viya
SAS Programming Runtime Environment	Now	For multi-tenant deployments of SAS Viya, adding ComputeServer hosts after initial deployment is not yet supported.
SAS Programming Runtime Environment	Later	Adding more hosts to the ComputeServer after the initial deployment is supported for single tenant deployments of SAS Viya

To keep a simple rule in mind, the Viya stateful services are critical and they typically are less forgiving of mis-configuration or other challenges. So try to determine their ultimate deployment topology as early as possible and then stick with it. The microservices are generally more accommodating of changes since they're designed to be resilient in that way already. And then CAS has been designed with future scalability in mind, so scaling it out is relatively easy. As a matter of fact, hosts can be added to (or removed from) an existing MPP CAS deployment without any interruption in service.

The SPRE is a a little complicated as it's not a single component, but many spread across different host groups in the inventory.ini file. And its deployment considerations vary on several key factors. See my article, Deploying the SPRE in SAS Viya 3.4 for more details.

Coda

SAS Viya offers a dizzying range of options for architecture, deployment, and operation. A couple of articles are not sufficient to address all of the options. However, we can all improve our knowledge our how the components of Viya work together and then use that to benefit our customers with a deployment plan which is tailored to their specific needs. My next blog post will describe the considerations weighed for an actual customer implementation. See Deploying SAS Viya in the real world.