Hosting SAS Viya in the cloud will soon become more common than hosting it on bare OS, if it is not already. Some cloud providers and their partners offer tools and services to help you implement disaster recovery (DR) capabilities.
Plus, the fact that SAS Viya 2021.1 and later runs on Kubernetes introduces new considerations for DR beyond those we had for SAS Viya 3.x on bare OS. So while SAS Viya 2021.1 and later on Kubernetes is not significantly more or less vulnerable to disaster than any previous release of SAS, these two things merit an update on considerations for DR in SAS Viya.
The slide below outlines considerations that are specific to cloud hosting and to Kubernetes:
SAS Viya on Kubernetes has specific technical requirements for Disaster Recovery. Your DR site's Kubernetes environment must have the same SAS Viya products at the same version of SAS Viya as your production site. The documentation says you must also use the same namespace name in the DR cluster, but we think this is more of a preference than a requirement, since it is quite straightforward to restore a Viya 4 backup to a different namespace name, and namespace names and DNS aliases are easily decoupled - while it may be common practice to have your Viya deployment's namespace name (production, test, gelcorp etc.) as a part of its DNS alias, it is certainly not a requirement to.
It must also be deployed on the same version of Kubernetes in the same cloud infrastructure, so failing over from one Azure AKS cluster to another Azure AKS cluster is okay, but we do not support failing over from an Azure AKS cluster to an Amazon EKS cluster as part of a DR process.
Microsoft Azure Site Recovery's best practices for DR in AKS recommend using their DNS-based load balancer Azure Traffic Manager to route traffic to either your primary or secondary AKS cluster, hosted in different Azure regions. It can interconnect two clusters to enable communication between them for data replication. It also has features for replicating container images between the Azure Container Registry in each Azure region where you have an AKS cluster, and the best practice guide above discusses several aspects of replicating storage. But it remains the technical architect's responsibility to figure out how to replicate SAS Viya state data between clusters in different regions in a way that will satisfy the customer's desired RPO and RTO. More on this below.
Start-up Arpio (see below) is also developing a capability for Azure, not yet released.
Arpio is a start-up based in Durham, NC (near Cary) who offer security-conscious replication of a production AWS environment in one region to a second 'recovery' region. SAS Cloud is moving towards using Arpio for DR capability, but at this point it's fairly new to me. See arpio.io/how-it-works/ for their marketing.
Amazon AWS CloudEndure is their DR offering for EKS-hosted SAS Viya deployments. Documentation for using CloudEndure for EKS clusters seems to be light, but the general principles for what needs to be replicated and how traffic needs to be directed to whichever region is currently hosting your services is the same as in Azure.
The unique selling point for CloudEndure seems to be cost reduction, in that it uses low cost staging machines on the DR site when the production site is healthy. In the event of a disaster in Production it scales up the machines on the DR site to full size, ready for business use.
However, CloudEndure may not always meet the security requirements for some of our more data security-conscious customers.
We have not so far identified similar services specifically developed to support Disaster Recovery for Google Cloud Platform or OpenShift. If you know of any, please tell me, as we'd like to cover them in our content.
In addition to Disaster Recovery practices outlined in the SAS Viya documentation, there is a SAS Disaster Recovery Policy for SAS Viya 3.4. There is not yet an SAS Disaster Recovery Policy specifically for SAS Viya on Kubernetes. Rob Collum's SAS Communities post on Contemplating disaster recovery for SAS from 19 July 2018 is also well worth reading.
The resources listed above aim to explain:
They also discuss how important it is that you agree with your business stakeholders:
Hosting SAS Viya on Kubernetes is significantly more complex than hosting SAS Viya on bare OS. Backing up and replicating to another environment is correspondingly more complex too.
Survey the data in YOUR SAS Viya environment which needs to be synchronised or replicated (e.g. by being periodically backed up and copied). These item are present in all SAS Viya deployments:
Here are some examples of data which may be present in some deployments:
Each separate data type will usually require its own element of your overall approach to data synchronisation from production to DR, and you likely need to sync them all. Plan for the capability to synchronize data in both directions: following a disaster and failover to your DR site, that site temporarily hosts real business activity. When your main production site is back up and available again, you must be able to synchronize all these types of data from your DR site back to your main production site, and perform what is sometimes called a 'fail back' but better described as a switch of services back from DR to production.
I have found fewer resources, case studies and examples of partially or wholly successful implementations of disaster recovery capabilities than I hoped to find, given how important we know DR is for many of our customers.
If you have designed, implemented or operated a DR capability for a SAS Viya deployment I would very much like to hear from you. There is no substitute for real-world experience, and we want to hear about yours, good or bad. Please leave a comment below, or email me at David.Stern@sas.com to share your stories, designs or case studies on this topic. Thank you in anticipation.
My thanks to Peter Muirhead, Gerry Nelson, Scott McCauley and Rob Collum for their review of earlier drafts of this post, and for their helpful feedback. Any errors or shortcomings are mine.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.