Validate a SAS Viya deployment

6 Likes

Someone validating a SAS Viya deployment usually aims to answer some combination of these questions:

Are basic elements of my SAS Viya deployment working?
Are the components of my SAS Viya deployment which will be used the most working acceptably well?
Do some specific non-functional performance tests of this SAS Viya deployment meet expectations?
Are access controls in SAS Viya granting and preventing access to data, content and application functionality in the intended way?

There are of course other possible aims for validation beyond these, such as test of integration with data sources or other applications, usability and accessibility tests, tests of internationalization and more. There is no one 'best' way to validate a SAS Viya deployment; a validation procedure should be tailored to a specific deployment, and to answer specific questions relevant to the business' needs. Choosing to include fewer validation tests makes the procedure faster, cheaper and potentially easier to automate. Having more validation steps to a procedure may mean it takes longer and costs more to identify, design, develop and run the tests, but when run they give a more complete view of a deployment's status.

Given all that, this post gives examples of validation steps that we use in the GEL team. It also includes some example tests which I personally don't have any need to run, but which we are sometimes asked about. The examples are not exhaustive; if you have experience of validating SAS Viya deployments and you check for something not mentioned here which you think others should check too, I would love to hear from you.

Edit 02 Oct 2023: In the context of life sciences and pharmaceuticals, Computer System Validation (CSV) is a rigorous testing and documentation process required to satisfy regulators that "the computer systems and their software operate as intended, meet user requirements, and comply with regulatory standards" (to quote Computer System Validation (CSV) in Life Sciences Part 1: Introduction to CSV - Verista). This article does not try to describe such rigorous validation in a regulated industry, and is limited to a more generally applicable set of confirmatory tests which may be useful to any SAS Viya administrator.

SAS Viya Validation Resources

SAS-provided tools documented in SAS Help Center

The SAS Readiness Service checks the status of the SAS Viya platform every 30 seconds, to determine whether it is ready for use. It checks whether SAS endpoints in Kubernetes are responding (e.g. with HTTP 200: OK responses to a REST call), and also whether the SAS Infrastructure Data Server database and OAuth authentication providers are responding. Since it is part of SAS Viya 'out of the box', it offers a universally-available way to check whether SAS Viya is fundamentally operational.

The SAS® Viya® Platform: Deployment Validation section in the SAS Viya Platform Administration documentation in SAS Help Center is a useful resource. It describes how to run the SAS Viya Platform Deployment Report, which is one tool in the SAS Viya 4 Administration Resource Kit, a toolset available in GitHub and which is practically indispensable for many deployment consultants, architects and similar who work with SAS Viya. We use tools in the viya4-ark in most of our GEL workshop collections. Running the script which generates the SAS Viya Platform Deployment Report is both easy and quick.

The SAS Viya Command-line Interface (CLI) is an essential tool for validation, as it is for many other administration activities.

SAS Viya Administration Checklist

The SAS Viya Administration Checklist has a task called Validate your SAS Viya Deployment. It proposes your validation first check whether Kubernetes is fundamentally working and has the expected nodes and namespaces. Then, it suggests some tests to validate SAS Viya, such as application URLs returning a page, SAS Studio code submission working for a trivially simple SAS code statement, presence of loaded data in CAS, some authorization tests and some validation tests of the SAS Programming Run-time which can be done using the SAS Viya CLI.

It also suggests using...

The ValidateViya pyviyatool

ValidateViya is a pyviyatool that runs a series of tests on a Viya environment, validating that it is running as expected. validateviya has a modular design, allowing for the creation of custom tests, the alteration of existing tests, and the removal of unneeded tests. See the vaildateviya User Guide in the pyviyatools project in GitHub.

Basic validation

*Taps microphone* "Is this thing on?"

The aim of basic validation is to run a quick set of tests which will reveal fundamental issues with your deployment fast. We check for things like Kubernetes not working, SAS Viya not being deployed successfully, pods not running, or users being unable to sign in. If major issues prevent a SAS Viya deployment from working at all, you will want to know about it early.

Validate the right one

If you have more than one SAS Viya deployment, a possible pre-requisite is to determine which one to validate. Which hosts, which Kubernetes cluster, which is the URL for the web applications in the deployment to be validated? It is of no use to test the health of the wrong environment, and allow that mistake to give you a false sense of security. Confirm the hostname or IP address for your Kubernetes ingress host and any other hosts you will interact with directly. If you have front-end load balancers, reverse proxies or similar, bypassing these during basic validation will simplify your testing, especially if there is any ambiguity about which deployment will really respond to requests sent to an alias.

Examples of steps in a basic 'is it on?' validation procedure

To give some examples, here are descriptions of some tests in a basic validation procedure for an environment I regularly rebuild - these tests are likely to be valid for nearly any SAS Viya deployment:

Can we make an SSH connection to a node in the Kubernetes cluster (the node we use to run kubectl and other administration tools)?
Is kubectl using the correct kube config file for the cluster on which the SAS Viya deployment is running?
Do we get the expected number of nodes, and are they all in the expected Ready state, when we run kubectl get nodes?
Are the Kubernetes namespaces for SAS Viya, and for supporting applications such as our observability toolset present, in the results of kubectl get ns?
How many pods are running in the Viya namespace? Is the number of running within an expected range (e.g. 170-190 pods for the SAS Viya software order I deployed)?
How many pods are pending in the Viya namespace? Is the number of pending pods larger than some acceptable minimum? (E.g. from past experience with this environment, I know that there are sometimes 2-3 pending pods out of 180 or so, and this does not necessarily mean there is a problem. But in this environment, more than this number of pending pods usually does indicate a problem.)
Is the sas-readiness pod itself reporting as being ready, indicating that its three types of check are all passing? See the SAS Readiness Service for more on this.
Can you sign in to SAS Environment Manager as an administrator account?
Does the SAS Viya license for this deployment expire on a date sufficiently far into the future?
Can you authenticate successfully against the SAS Viya Command-line interface?

Validate frequently-used functional components

If basic validation passes, we generally want to look just a little bit more closely at some applications, to check if the components of SAS Viya that end users will use the most are working. This next stage of validation still needs to be fast, and reveal commonly-occurring issues which might not be revealed by the basic validation tests above.

Examples of steps in a functional validation procedure

Your basic functional validation tests, in your environment, may differ significantly from the following list depending on what your users do most in SAS Viya. The following descriptions are the gist of what is tested, not how I test those things. They would typically be a mix of manual and scriptable tests. For the environment I look after, basic functional validation includes steps like these:

In SAS Environment Manager's Data page, or using the sas-viya CLI's cas plugin, are any CAS tables are loaded into memory, and are they the tables we expect to be loaded into memory in CAS for this environment?
Open SAS Studio. Does a compute session under the SAS Studio compute context start successfully? Can we run a trivially simple SAS statement such as proc options; run; , and does it produce normal log output without errors?
Can we submit the same program successfully in SAS Batch, using the sas-viya CLI's batch plugin? Does the SAS batch job execute, and return sensible-looking SAS logs with no errors?
From SAS Drive, can we open the User Activity report? This is efficient in the sense that it tests several things at once:
- SAS Drive is working, and we can navigate SAS Viya folders for their contents
- SAS Visual Analytics is working and able to render charts
- SAS Cloud Analytic Services (CAS) is working and able to provide data to SAS Visual Analytics
- The Audit service is running. By default it only loads data into the AUDIT table in CAS once per hour, so we may have to wait for it to run before we see data in the User Activity report.
Can a simple predictive model be built, trained, published and validated using tools such as SAS Visual Analytics, SAS Model Studio and SAS Model Manager? A scripted process which passes through key functional steps in these applications should be completed successfully.
Can we open and sign in to our chosen log monitoring application (such as OpenSearch Dashboards), and see log messages and their contextual data flowing in from the SAS Viya deployment? Are there an unusual volume of ERROR-level log messages being produced?
Can we open and sign in to your metric monitoring application (such as Grafana), and see metric data flowing in from the SAS Viya deployment and its host Kubernetes cluster? Do a few example charts of metrics appear to display normal patterns of metric data, compared to what we would usually expect to see in those charts?
If there are any other supporting applications deployed in your Kubernetes cluster alongside your SAS Viya deployment, are they healthy and working?

Validate non-functional performance

Unless something unusual has gone wrong, the same specification of physical and virtual compute and storage resources, with the same versions of Kubernetes and its supporting components running the same release of SAS Viya which has been configured the same way will typically perform about the same every time you deploy it. Non-functional performance validation aims to demonstrate that an environment performs adequately in a series of workload or stability tests, compared either to some benchmark result from a similar environment in the past, or compared to a business objective.

This kind of validation likely to be performed less frequently; it may only need to be done once, or a handful of times, per environment. Therefore, administrators may have less need for this type of validation to be fast or automated. Non-functional testing for processing performance, responsiveness, resilience etc. certainly takes longer to design, and also typically takes longer to perform than basic validation.

This is the aspect of validation I have least need for with the classroom environments I support, so perhaps you would like to reply to this post in the comments below to describe what you need to cover with this type of validation. From my experience of consulting roles in SAS, the following test ideas could be a good starting point:

Can I load a specific-size data set into CAS in an acceptable amount of time?
How long does a specific typical program take to run?
How long does a data processing job flow take to run? Does it finish successfully within an acceptable time?
- Can the same data load test be performed at any of the usual times it might need to run, including when the system is under user load, and still complete within an acceptable time?
- The 'acceptable time' here may be defined by a contract or service level agreement, or may be in comparison with some benchmark result from a previous test.
How long does it take a user to sign in and open each of the applications they use most, be that a report, a programming session, a modelling tool etc.?
How long does a specific typical report take to open?
How long does a specific model flow take to complete?
How does responsiveness vary when the system is under high real or simulated end-user workload? (Can this testing be used to define what 'high end-user workload' is, for this environment, and is that enough capacity for the organization who owns it?)
Does a backup job complete successfully?
Does running a backup have an unacceptable impact on the rest of the system's performance?
Does a restore from backup complete successfully?
For components of the deployment which are highly available, does the system react in the correct way if I intentionally stop/delete/remove e.g. one pod in the replica set or equivalent. For example. does Kubernetes successfully re-start the failed pod and resume sending traffic to it when it is running again?

I expect I missed some things that people will consider important here. If your SAS environment is subject to a service level agreement (SLA), that should define metrics that can be objectively tested to validate whether the environment meets the SLA or not. So, define performance validation tests that measure those metrics, and can be automated.

Let me know in the comments below what other types of non-functional testing you think a validation procedure should include.

Validate access controls

The Security Policy and Authorization Model tasks in the SAS Viya Administration Checklist describe how access controls should be defined and documented for a SAS Viya deployment. Follow these to an extent that is proportionate in the context of how tightly or loosely your SAS Viya environment and the data it contains need to be protected from unauthorized access.

Then, again as appropriate for the degree of rigor required, you can define a series of validation tests to perform once, occasionally or regularly to prove that users who should have access to data, content and application functionality do have that access, and that users who should not, do not.

You may find some value in tools to help you automate this. pyviyatools that might help include:

explainaccess.py - a flexible tool that can explain access controls on a SAS Viya folder, report, or for another object or a service URI, for any specific user or group, or for all users and groups who have applicable access controls
getruleid.py - pass in a SAS Viya object or service URI, and a user or group, and get back the ID of a SAS Viya general authorization rule that applies to that object for that user
listcaslibsandeffectiveaccess.py - for a specific caslib, or for all caslibs on all CAS servers, return a list of all effective access controls in the CAS authorization system
listcastablesandeffectiveaccess.py - a somewhat flexible variation of the above for CAS tables: return a list of all effective access on all CAS tables (or a specific table) in all CAS libraries (or a specific CAS library) on all servers. Can optionally include row-level permissions and permissions on source tables.
listgroupsandmembers.py - return list of all groups and all their members, optionally including their email addresses (if available in the identity service's data)
listrules.py - list either all general authorization rules (typically for further processing) or rules that target a specific URI and/or mention a specific user or group
testfolderaccess.py - assert whether a folder is accessible to a user or not. This tool is intended to be called with a specific folder path, principal, permission and a sense (whether access with this permission is expected to be granted or not). It will respond 'TEST PASSED' or 'TEST FAILED' with the specifics of what was tested. We run a series of tests using this tool in one of our GEL workshops, to validate folder permissions for a set of folders for a set of users and groups, and show how one folder is 'incorrectly' protected (intentionally so) and causes one test in the suite to fail.

As an aside, the simpler and more regular your authorization model design is, the more easily validated it will be. If validating access controls is important in your organization, time spent in carefully designing and documenting the authorization model so that it is easily understood, clearly described and simple will be much more cost-effective to the business than a haphazardly designed and carelessly implemented authorization model. It will also be far easier to validate.

Between the four general types of validation tests described above, you should be able to get a good sense of whether a SAS Viya deployment is working, useable for the purpose it was intended to be used for, performant and correctly secured. There are certainly other kinds of validation that could be performed, but I think those are the main and most common ones. But what do you think - did I miss a kind of test that everyone ought to include when they validate SAS Viya? Let me know in the comments below.

Find more articles from SAS Global Enablement and Learning here.