Deploying SAS Viya in the real world

1 Like

In my earlier posts, we discussed illustrations of various deployment patterns and anti-patterns for SAS Viya when working with different numbers of server hosts. To refresh your memory, see Deploying SAS Viya on 1, 2, 3 servers and Deploying SAS Viya on more servers.

For this article, we will look at a real world, multi-host deployment of SAS Viya 3.4 which provides higher availability at a major European municipality.

The following diagram illustrates the major architectural components employed:

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

This deployment pattern isn't directly represented in my earlier articles. So here's what I would've written…

Scenario: A smart compromise ⇒ 6 or more hosts

In some situations where the workload for Viya and its implications for the host machines are well understood, then we can further combine roles onto the same hosts. Specifically, we can place the CAS controller(s) alongside other Viya infrastructure on the same machines and still provide improved availability. Only the CAS workers would be deployed to dedicated hosts.

▷ 3+ hosts for SAS Viya Stateful Services, Microservices, SPRE, and CAS Controllers

[Stateful services host groups: consul, httpproxy, pgpoolc, rabbitmq, sasdatasvc]

[Microservices host groups: AdminServices, CASServices, ComputeServices, CoreServices, DataServices, HomeServices, ReportServices, ThemeServices, configuratn, and many more]

[SPRE host groups: Operations, ComputeServer, programming]

[CAS host groups: cas-casserver-primary, sas-casserver-secondary]

Why 3 hosts? The SAS Configuration Server software is a specially packaged version of Consul from Hashicorp. Consul is designed to implement HA through a consensus methodology which relies on an odd number of hosts (that is 1, 3, 5, not 2, 4, 6) to prevent the split-brain problem. The same concept also applies to RabbitMQ, which is the technology behind SAS Message Broker.

Deployment:

Place the consul and rabbitmq host groups on all 3 hosts.
Place the sasdatasvc (SAS Infrastructure Data Server) host group on 2 hosts. The pgPool-II software currently cannot be clustered yet (to be addressed in Viya 3.5). Place the pgpoolc hostgroup on the primary sasdatasvc host.
Place the host groups which provide the SPRE (Operations, ComputeServer, programming) on 2 (or more) hosts – with the exception of the operations microservice. It’s currently not clusterable (yes, single point of failure) and can only deploy to a single host. Add SPRE to more hosts if needed (at time of initial deployment only!).
Place 2 sets of the microservice host groups across all 3 hosts, distributed evenly by expected RAM usage.
Place sas-casserver-primary (CAS controller) host group on 1 host and sas-casserver-secondary (CAS backup controller) host group on a different host.

▷ 3+ hosts for CAS Workers

[CAS host groups: cas-casserver-worker]

In the illustration above 13 hosts for CAS workers are provided.

Deployment: Place the sas-casserver-worker host group on separate physical hosts. Add (or remove) workers at any time.

6+ Hosts Points of Interest

The co-existence of Viya services plus SPRE plus CAS controllers may cause contention for limited resources in some situations - yet it provides a minimal footprint with effective improvement to overall availability.
The placement of the SPRE must be decided at initial deployment - and may be difficult to change later without a complete redeployment of Viya. So plan this carefully to ensure SPRE usage is inline with this deployment scenario. Consider keeping the SPRE separate from the primary CAS controller. See my post, Deploying the SPRE in SAS Viya 3.4, for more information.
Keep in mind that the SPRE is typically licensed to run on the same number of CPU cores as allocated for CAS. If CAS is licensed for 100 cores, then the SPRE can run on another 100 cores. Because a CAS controller and SPRE co-exist on 1 host, it's like those cores are counted twice (from a licensing/usage perspective).
The CAS backup controller host does not count towards the number of CPU cores utilized by CAS (until it takes over primary role).
The SPRE software as well as the full set of microservices were deployed on to each of the three Viya hosts. The singleton operations microservice was placed on the third Viya host without a CAS controller (since it'd be less busy). Not shown, but also the pgpoolc hostgroup was placed separately from the two sasdatasvc host group instances on the third Viya host without a CAS controller. This required manual post-deploy effort to finalize pgPool-II communication with the PostgreSQL servers - and the team would likely recommend against it in a fresh deployment, preferring instead to keep the pgpoolc host group assigned to the same primary host as the sasdatasvc host group.

More Environmental Considerations

The great thing about contemplating real-world deployments is that we can see where other considerations are important to ensure success.

Data Loading:

For this site, data is only loaded into CAS by running it serially through a SAS 9 instance (run as part of SAS Grid Manager). While not the speediest approach for loading data into CAS, it was selected to make implementation simpler. Providing CAS with direct access to other data (both native and external sources) may be a future update.

Shared File System:

IBM SpectrumScale (GPFS) is a powerful and flexible shared file system technology. And it integrates with SAS technologies very well. It's currently used here to provide shared storage between the primary and secondary CAS controllers as well as for SAS Viya backups.

GPFS is a natural fit as a shared file system to provide high-performance data transfer to multiple CAS workers (as seen with the DNFS type of caslib, and other data sources). Do not use GPFS for CAS_DISK_CACHE. It is not compatible with that use case. Instead, provision local disk resources for the CAS cache.

Load Balancer for Web Activity:

Multiple instances of the Apache HTTP Server can be deployed to provide improvement in availability and scalability. However, they generally are unaware of each other, operating statelessly. So a third-party balancer is necessary to direct traffic to each HTTP Server (with session stickiness configured). There are many technologies which can fill this role - just make sure it's deployed with high availability, too.

Coda

While there are many ways to deploy Viya, not all of them are appropriate for your site and use cases. Plan your Viya deployment carefully taking into account the various workloads and expectations for performance. SAS can help with this effort. Contact the SAS Enterprise Excellence Center for free hardware sizing estimation.

somasekhar · ‎11-27-2019

how to use SAS/connect or proc sql to pull the data in different unix paths to sas studio in SAS Viya?

RobCollum · ‎12-04-2019

@somasekhar, the simplest and most effective way to load data into the SAS Programming Runtime Environment from a location that is locally visible on the host machine is to use a libname statement to create a library reference that can be used in your SAS program code.

See: LIBNAME Statement in the SAS 9.4 and SAS Viya 3.5 Programming Documentation.

Example:

libname MyData "/path/to/my/SAS/data";

And then you can reference that libref in your SQL - like here when creating a table:

proc sql;

create table MyData.NewTable

(IdNum char(4),

Gender char(1),

Jobcode char(3),

Salary num,

Birth num informat=date7.

format=date7.,

Hired num informat=date7.

format=date7.);

dwhit14 · ‎12-08-2020

What other options are there for Shared File System? GPFS is very expensive.

RobCollum · ‎12-08-2020

@dwhit14 - there's a lot of options to choose from - you're certainly not limited to only GPFS - and also a myriad of considerations as to how it pertains to SAS.

For a quick overview of what shared file systems mean for SAS, check out:

Contemplating shared file systems for SAS

https://communities.sas.com/t5/SAS-Communities-Library/Contemplating-shared-file-systems-for-SAS/ta-...

And then for a deeper technical dive into the actual workings of shared file systems that affect their suitability for SAS applications, see:

Shared File Systems: Determining the Best Choice for your Distributed SAS® Foundation Applications

https://support.sas.com/resources/papers/proceedings17/SAS0569-2017.pdf