In my earlier posts, we discussed illustrations of various deployment patterns and anti-patterns for SAS Viya when working with different numbers of server hosts. To refresh your memory, see Deploying SAS Viya on 1, 2, 3 servers and Deploying SAS Viya on more servers.
For this article, we will look at a real world, multi-host deployment of SAS Viya 3.4 which provides higher availability at a major European municipality.
The following diagram illustrates the major architectural components employed:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
This deployment pattern isn't directly represented in my earlier articles. So here's what I would've written…
In some situations where the workload for Viya and its implications for the host machines are well understood, then we can further combine roles onto the same hosts. Specifically, we can place the CAS controller(s) alongside other Viya infrastructure on the same machines and still provide improved availability. Only the CAS workers would be deployed to dedicated hosts.
[Stateful services host groups: consul, httpproxy, pgpoolc, rabbitmq, sasdatasvc]
[Microservices host groups: AdminServices, CASServices, ComputeServices, CoreServices, DataServices, HomeServices, ReportServices, ThemeServices, configuratn, and many more]
[SPRE host groups: Operations, ComputeServer, programming]
[CAS host groups: cas-casserver-primary, sas-casserver-secondary]
Why 3 hosts? The SAS Configuration Server software is a specially packaged version of Consul from Hashicorp. Consul is designed to implement HA through a consensus methodology which relies on an odd number of hosts (that is 1, 3, 5, not 2, 4, 6) to prevent the split-brain problem. The same concept also applies to RabbitMQ, which is the technology behind SAS Message Broker.
Deployment:
[CAS host groups: cas-casserver-worker]
In the illustration above 13 hosts for CAS workers are provided.
Deployment: Place the sas-casserver-worker host group on separate physical hosts. Add (or remove) workers at any time.
The great thing about contemplating real-world deployments is that we can see where other considerations are important to ensure success.
For this site, data is only loaded into CAS by running it serially through a SAS 9 instance (run as part of SAS Grid Manager). While not the speediest approach for loading data into CAS, it was selected to make implementation simpler. Providing CAS with direct access to other data (both native and external sources) may be a future update.
IBM SpectrumScale (GPFS) is a powerful and flexible shared file system technology. And it integrates with SAS technologies very well. It's currently used here to provide shared storage between the primary and secondary CAS controllers as well as for SAS Viya backups.
GPFS is a natural fit as a shared file system to provide high-performance data transfer to multiple CAS workers (as seen with the DNFS type of caslib, and other data sources). Do not use GPFS for CAS_DISK_CACHE. It is not compatible with that use case. Instead, provision local disk resources for the CAS cache.
Multiple instances of the Apache HTTP Server can be deployed to provide improvement in availability and scalability. However, they generally are unaware of each other, operating statelessly. So a third-party balancer is necessary to direct traffic to each HTTP Server (with session stickiness configured). There are many technologies which can fill this role - just make sure it's deployed with high availability, too.
While there are many ways to deploy Viya, not all of them are appropriate for your site and use cases. Plan your Viya deployment carefully taking into account the various workloads and expectations for performance. SAS can help with this effort. Contact the SAS Enterprise Excellence Center for free hardware sizing estimation.
how to use SAS/connect or proc sql to pull the data in different unix paths to sas studio in SAS Viya?
@somasekhar, the simplest and most effective way to load data into the SAS Programming Runtime Environment from a location that is locally visible on the host machine is to use a libname statement to create a library reference that can be used in your SAS program code.
See: LIBNAME Statement in the SAS 9.4 and SAS Viya 3.5 Programming Documentation.
Example:
libname MyData "/path/to/my/SAS/data";
And then you can reference that libref in your SQL - like here when creating a table:
proc sql;
create table MyData.NewTable
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.,
Hired num informat=date7.
format=date7.);
What other options are there for Shared File System? GPFS is very expensive.
@dwhit14 - there's a lot of options to choose from - you're certainly not limited to only GPFS - and also a myriad of considerations as to how it pertains to SAS.
For a quick overview of what shared file systems mean for SAS, check out:
And then for a deeper technical dive into the actual workings of shared file systems that affect their suitability for SAS applications, see:
Shared File Systems: Determining the Best Choice for your Distributed SAS® Foundation Applications
https://support.sas.com/resources/papers/proceedings17/SAS0569-2017.pdf
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.