BookmarkSubscribeRSS Feed

Building a Grafana dashboard to monitor SAS Viya

Started 3 weeks ago by
Modified 3 weeks ago by
Views 224

I seem to be in a phase of working with Grafana and the SAS Viya Monitoring for Kubernetes framework. My journey started with monitoring SAS Container Runtime deployments. While I was creating a Grafana dashboard for that monitoring, I started to gather my thoughts around a possible dashboard to provide an overview for a SAS Viya deployment.

 

This was for use in one of the education workshops that I maintain, SAS Viya: Deployment on Azure Kubernetes Service

 

In this post I will share my thoughts, experience and the sample dashboard.

 

I will start by stating this work is not to replace any of the dashboards that SAS provides with the SAS Viya Monitoring for Kubernetes GitHub project

 

As mentioned, it was originally developed for use in the SAS Viya: Deployment on Azure Kubernetes Service workshop. I’m posting this to make it available outside of the workshop.

 

The objective of the dashboard was to provide an overview of the SAS Viya deployment and a health check “on a page”, on a summary dashboard. I wanted to monitor the progress of a deployment, including the status of the deployment operator if it was used.

 

Let’s take a look…

 

Here is my “on a page” view. In a Grafana dashboard objects (visualisations) can be grouped into rows. The images are showing the ‘Status Overview’ row.

 

This first screenshot is very early in the SAS Viya deployment startup.

 

01_MG_202605-01_Dashboard-startup1-v2.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

A bit later…

 

02_MG_202605-02_Dashboard-startup-v2.png

 

The ‘Status Overview’ row is showing the state of the deployment and the key SAS Viya components. Starting from the top. The blue Stat gauge on the top left is showing that the SAS Deployment Operator was ‘Not Detected’. This was an example of using the manual deployment method, a deployment using the Kubernetes commands. Later you will see an example of when the deployment operator was used.

 

To the right of that you can see the 'Platform Readiness', this is using a ‘Status History’ visualization, which provides a timeseries plot of the state of the SAS Readiness probe (pod).

 

To the right of the Platform Readiness is the cadence information. I will discuss this is more detail later.

 

Next you can see the heading 'Heath Overview'. This is using a Text panel, which I used to help position and group the next set of visualizations. In the ‘Health Overview’ section you can see the various components being monitored. The SAS Viya startup is progressing, with many of the components already running or ready.

 

You can see that the sas-workload-orchestrator StatefulSet is in the process of starting and that 90% of the SAS Viya deployment objects have started.

 

This next image was taken once the SAS Viya had fully started. The SAS Viya readiness probe is “Ready”, as shown with the green bars.

 

03_MG_202605-03_Dashboard-running-v2.png

 

I was working with SAS Viya Stable 2026.02 and 2026.03, testing with and without using Contour as the ingress controller. In this image the ‘Proxy / Ingress’ gauge is showing the state of the HTTPproxies. You can see that they are 100% ready.

 

I feel I should explain this visualisation, as it took a bit of effort to get it working the way I wanted. First off, I wanted the dashboard to work with ingress-nginx and Contour. I should state here that I haven’t tested this with RedHat OpenShift, which uses the OpenShift Ingress Operator. A query would most likely have to be added for the “route” metrics data.

 

The gauge has two queries, one to calculate (estimate) the readiness of the ingresses and a second query to get the status of the HTTPproxies.

 

The first challenge was there isn’t a metric to let you calculate the readiness of the ingress objects. The workaround for this is shown in the following query.

 

clamp_max(
sum(kube_service_created{namespace="$namespace", service=~"sas.*"}) /
sum(kube_ingress_created{namespace="$namespace"}) * 100, 100
)

 

As you can see the readiness is based on looking at the number of services in the SAS Viya namespace and comparing that to the number of ingress objects. I had to do this as there isn’t any metric to directly show the number of ingress objects created and ready.

 

In a SAS Viya deployment, not all services have an ingress associated with them. So, the only way I could calculate (estimate) readiness was to compare the number of services to ingress objects. If the percentage is greater than 100% it is most likely that all the ingress objects are ready. Hence, the “clamp_max” on the query.

 

The clamp_max operator allows you to set a maximum value to be displayed, in this case 100. Displaying that a resource is “106% Ready” doesn’t really make sense.

 

The next problem was the way the HTTPproxy metric data is collected. Even through you can do a kubectl get httpproxy -n viya_namespace to get a list of all the proxies in the SAS Viya namespace, in the promQL query you have to select on the “exported_namespace” label. If you just query for proxy data on the SAS Viya namespace no results are returned. This is due to the Contour metric data being associated with (collected as part of) the Contour namespace.

 

sum(contour_httpproxy_valid{exported_namespace="$namespace"})
/
sum(contour_httpproxy{exported_namespace="$namespace"}) * 100

 

However, unlike the ingress metrics data there is a counter for the number of proxies and the number that are in a valid state (contour_httpproxy_valid).

 

The nice thing about the way the Grafana visualisations work is that you can define multiple queries. The gauge will not display anything for a query that returns no results. Hence, I could use one object for both queries. It is using a Gauge visualisation.

 

Coming back to the SAS Viya cadence information.

 

04_MG_202605-04_Cadence.png

 

In the images you can see that I was deploying SAS Viya Stable 2026.03. The cadence information is held in a ConfigMap, in the data fields. The ConfigMap data is not natively available as collected metrics data.

 

So, how did I get it into the dashboard?

 

I used labels on the SAS Viya namespace, as the label information is natively collected. Once the deployment had reached the Ready state, I ran the following command to add the labels.

 

# Get the metadata pod name
NS=viya_namespace
pod=$(kubectl -n ${NS} get cm -l orchestration.sas.com/lifecycle=metadata | grep sas-deployment-metadata | awk '{print $1}')
# Get metadata
IFS='|' read -r SHORT_NAME VERSION RELEASE <<< "$(
kubectl -n "${NS}" get cm "${pod}" \
-o jsonpath='{.data.SAS_CADENCE_DISPLAY_SHORT_NAME}{"|"}{.data.SAS_CADENCE_VERSION}{"|"}{.data.SAS_CADENCE_RELEASE}'
)"
# Remove any spaces from the short name
CNAME=${SHORT_NAME// /_}

# Add labels
kubectl label namespace ${NS} sas_cadence_name="${CNAME}" --overwrite
kubectl label namespace ${NS} sas_cadence_version="${VERSION}" --overwrite
kubectl label namespace ${NS} sas_cadence_release="${RELEASE}" --overwrite

 

Once the labels were in place the panel displays the cadence information. While this was a manual step it would be possible to have an automated job to periodically update the labels. The cadence information is useful to understand for diagnostic reasons. Especially when using the Deployment Operator for automatic release updates.

 

In the images above you may have noticed that the dashboard has three rows, being:

 

  • Status Overview
  • ReplicaSets, PVC details…
  • Sessions and Jobs.

 

The ‘ReplicsSets, PVC details…” and “Sessions and Jobs” rows provide more detailed information on the SAS Viya deployment.

 

In the 'Sessions and Jobs' row you can get information on the current number of SAS Compute and CAS Server sessions. For example, I started a number of SAS Studio sessions and started some CAS sessions. The result is shown in the following image.

 

05_MG_202605-05_Sessions_and_Jobs.png

 

In the image you can also see that two jobs have completed. These are two of the standard platform batch jobs.

 

This also illustrates the approach of looking at other dashboards to build your custom dashboard (why reinvent the wheel). The query for the CAS Sessions is taken from the SAS provided “SAS CAS Overview” dashboard.

 

This uses the following query.

 

sum(cas_grid_sessions_current{namespace="$namespace", cas_server="$casServer", service=~".+-client"})

 

As you can see it is using the cas_grid_sessions_current metric as a way of calculating the current sessions.

 

In the query you can see that a couple of variables are used: $namespace and $casServer. The namespace variable provides a filtered list of namespaces. The following query is used to only select the namespaces containing SAS Viya pods.

 

06_MG_202605-06_namespace.png

 

The “sas.com/deployment=sas-viya” label is a standard label that is applied to the SAS Viya pods. Note, in the Prometheus database dots and slashes are stored as underscores.

 

Finally, since I mentioned the SAS Deployment Operator, here is an example of a deployment using the operator. Again, you can see the startup proceeding. This time for the 'discovery' namespace.

 

07_MG_202605-07_Dashboard-startup2-v2.png

 

In the ‘Deployment Operator Status’ panel you can see that the operator has successfully run (Succeeded) and the time that the operator was started.

 

 

Where to from here…

 

When creating a Grafana dashboard you should consider the following:

 

  • Who is the intended audience for the dashboard, and
  • What is the purpose or objective of the dashboard?

 

For example, is the dashboard for general status monitoring, or performance monitoring, system availability or problem determination?

 

You want to avoid the “kitchen sink” dashboard. This is a dashboard that has a lot of unrelated data (visualisation panels) and is a mashup of all possible types of data.

 

Understanding the target audience for the dashboard and the intended purpose will help you focus on the relevant information. Existing dashboards can also be a great source of inspiration.

 

As I stated earlier, I developed the dashboard for use in a workshop exercise. If you would like to try out deploying SAS Viya Monitoring for Kubernetes and the dashboard see the SAS Viya: Deployment on Azure Kubernetes Service workshop (the dashboard will be available with the LTS 2026.03 workshop update).

 

The dashboard is also available (now) from the sassoftware area on GitHub, see: sassoftware/sas-education

 

I hope you find this useful.

 

 

Michael Goddard

 

 

 

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
3 weeks ago
Updated by:

Viya Copilot Motion Graphic.gif

Ready to see what SAS Viya Copilot can do?

Visit the Tips & Tricks page for setup guidance, demos, and practical examples that show how Copilot supports your workflows.

Get Started →

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags