The support of Multiple Availability Zones for SAS Viya was announced with SAS Viya 2025.10 and this post presents the associated requirements, limits and implications.
In this (pretty detailed and technical) post, we go one step further to discuss and illustrate the provisioning of a Multi-Zone Infrastructure using the SAS provided "IaC for azure" project and the deployment of SAS Viya across multiple Availability Zones in Azure Kubernetes Services (AKS).
As this post has become quite long, here is a little "ToC" to help you navigate its content :
The goal of customers deploying SAS Viya across multiple availability zones is to increase the reliability of their analytics platform, even in case of a major incident happening in a given availability zone's data center (e.g. a power, cooling or network outage).
Redundant infrastructure is provisioned to protect the SAS Viya platform and ensure as much continuity as possible during a zone failure.
SAS has developed and maintains IaC (Infrastructure as Code) tools available in GitHub for the major Cloud providers with Terraform scripts. These tools can be used to automatically provision the required infrastructure (the managed Kubernetes cluster itself, but also the Jump and NFS VMs, as well as the managed storage and managed PostgreSQL services if needed).
If we look more closely at the IaC project for Azure, we notice that several changes have been made toward the goal to standup a suitable multi-zone AKS cluster for a SAS Viya deployment across multiple AZ:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
However the work to provision a true "production-grade" multi-zone deployment with the infrastructure to support a zone failure has not been completed yet, two things are currently missing in the tool (as of January 2026):
So while the IaC tool could be used as a starting point, additional configuration for the storage and external postgres level is required for a customer looking for a SAS Viya platform that can survive an availability zone failure.
The IaC tool is just ONE way to setup the required infrastructure in the Cloud, but it's not the only way. It is likely that customers familiar with this kind of sophisticated infrastructure already rely on their own custom tools and processes to provision a true multi-AZ infrastructure to maintain precise control of all Cloud components.
Whatever solution is retained to provision the Kubernetes cluster and associated services in the Cloud, it is important to remember that the customers remain responsible for their infrastructure and to ensure that it meets their own requirements (whether it is in terms of cost, speed and resilience) as well as SAS Viya's operational requirements, too.
While it has not been released at the time of this write-up, the IaC team is working on the changes to address the missing items listed above (multi-zone support for NetApp and Azure PostgreSQL flexible server).
As part of this work a new Terraform example has been provided in a working branch (pscloud-382) and provides new variable for a more robust multi-zone setup.
Note that we’re looking at code that’s currently in a development branch, not yet released for supported use.
Let’s review these new variables !
First for PostgreSQL to be configured with zone-redundant HA there are 3 new variables that you can use in the postgres_server configuration block :
Then for NetApp to be configured with a replicated volume in a distinct zone, you can choose the "ha" storage type and set the NetApp zone and replication parameters as noted below:
While we use Azure NetApp Files storage in this example, it may not be currently the best solution to ensure a complete continuity of the SAS Viya platform.
During SAS's validation of multi-zone storage solutions with Azure NetApp Files (ANF), it was observed that a storage failover event can result in the underlying volume's IP address changing. In this scenario, SAS Viya services might error out and not automatically refresh the new mount path. If this occurs, restarting the relevant SAS Viya pods is required to reestablish access to the storage volume after failover (an internal DNS FQDN record should also be provisioned, and used to reference the NetApp Volume mount IP address, instead of directly using hard-coded IP addresses). Azure Files NFS Premium as zone‑resilient storage is another option that could be explored.
A note associated to this issue has been added in the latest version of the SAS documentation :
Note that this limitation is also documented in the working branch of the IaC tool.
Customers should evaluate the Zone Redundant Storage and Cross-Zone replication options that best aligns with their operational and resiliency requirements. On the other hand, failover tests (from one zone to another) were also performed with Azure PostgreSQL Flexible server configured for HA and everything worked as intended : after the switch no action was required on the SAS Viya platform to work with the PostgreSQL "standby" replica.
The point is, recovery from a zone failure might require the identification and development of processes to accommodate SAS Viya and third-party services that cannot automatically failover to the redundant infrastructure.
To benefit from a Multi-zone deployment increased resilience, SAS Viya and its ecosystem should be configured for High-Availability (HA). It typically means multiple replicas of the pods.
Instructions to configure all the microservices HPAs with 2 replicas are documented there.
However for an HA deployment not only for all the microservices, but also for OpenSearch CAS and critical 3rd party services (NFS provisioner, ingress-nginx), should also be configured for High Availability.
Instructions to configure OpenSearch and CAS for HA can be found in the SAS Documentation here and there.
In the SAS viya platform, block storage is required or recommended for most of StatefulSets (RabbitMQ, Redis, OpenSearch, Crunchy Data server) for their RWO ("ReadWriteOnce") volumes. They should be provisioned on Zone Redundant Storage.
As noted in the Azure documentation, " starting with Kubernetes version 1.29, when you deploy Azure Kubernetes Service (AKS) clusters across multiple availability zones, AKS now utilizes zone-redundant storage (ZRS) to create managed disks within built-in storage classes. ZRS ensures synchronous replication of your Azure managed disks across multiple Azure availability zones in your chosen region."
It means that if have provisioned a "multi-zone" AKS cluster, then the "default" built-in Storage Class already relies on ZRS disks. In a multi-zone deployment, it is the Storage Class that you want to use for the StatefulSets : RabbitMQ, Redis, OpenSearch, Crunchy Data server. Note that for improved performance (with high-performance SSD disks) you could switch to the "managed-csi-premium" storage class which relies on Azure Premium zone-redundant storage (ZRS).
Finally, regarding the 3rd party services, the NFS provisioner (NFS CSI driver) and ingress-nginx High Availability can be achieved with additional replicas of the controllers, pod anti-affinity and pod disruption budgets. Note that these services as well as other required infrastructure are not included with the SAS Viya platform. Therefore, they are the responsibility of your site’s IT team to set up and maintain.
Now, let’s have a look at the differences between a single zone cluster and a cluster which has been provisioned for a multi-zone deployment of SAS Viya.
In the Azure portal, you can look at the Node Pool properties to check that the node pools are configured with multiple Availability zones.
For more detail showing which zone your Kubernetes nodes are running, try the kubectl command :
kubectl get nodes -L topology.kubernetes.io/zone to make sure that your nodes are running in different zones.
We have 2 NetApp volumes. The first volume shown on the screenshot is the primary volume. The second one (at the bottom) is the "Destination volume" and has the "Data protection" Volume type with the primary NetApp volume as the "source" volume.
Using the az CLI with the az netappfiles volume list command, we can make sure that each volume is in a different zone.
Just like for NetApp, we can also check in the Azure Portal the Azure Database for PostgreSQL flexible server properties to ensure it has been configured for High-availability, with a primary and a standby instance running in different zones.
Note that you can also check that with the az postgres flexible-server show command.
Example :
One important thing to check, in a multi-AZ deployment, is that your pods are spread across your availability zones (so that in case of a zone failure, there is a surviving replica in the other zone).
This command iterates through the "sas-logon-app" pods, finds their node, queries the node for the topology.kubernetes.io/zone label, and prints the result.
|
kubectl -n <namespace> get pods -l app=sas-logon-app -o jsonpath='{range .items[*]} {.metadata.name}{"\t"}{.spec.nodeName}{"\n"}{end}' | while read pod node; do jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}') |
It should look like that :
It confirms that each sas-logon-app pod is running on a different availability zone !
If you remove the pod selector (-l app=sas-logon-app) option from the command, then you can get the information for all the pods.
It has been noted in our test environment that restarting all the SAS Viya services improve the evenness of the replicas distribution. While, after the initial deployment, several pods replicas were scheduled on the same zone (or even the same node!), after a restart almost every pod and its replica were on different zones.
Also keep in mind that in this situation, where 2 replicas of a service’s pod were on the same zone, a zone failure would only trigger a very short interruption of service since the pods would be automatically restarted on a remaining zone by the Kubernetes scheduler to reconcile the state of the cluster with the defined topology. This "self-healing" feature is provided by design by the Kubernetes platform.
"Your mileage may vary". This idiomatic phrase could not be more true for this multi-zone environment setup...Each cloud provider offers ways to leverage multiple distinct data centers in different geographic availability zones to improve the resilience of the applications that are running in their infrastructure.
While we have explored some aspects of multi-AZ deployment of SAS Viya in Azure, SAS cannot document and test every possible scenario for each storage, database, virtual machines services available in the Cloud. So, we rely on the site’s IT team to evaluate and select the best available options for their infrastructure (redundant storage, redundant external PostgreSQL database, failover options) to ensure that they align with their operational and resiliency requirements.
That's all for today, hope you'll find this post useful !
Find more articles from SAS Global Enablement and Learning here.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.