SAS Container Runtime Observability – Part 2

3 Likes

In this post we will continue the look at observability for SAS Container Runtime. In Part 1 I discussed the base configuration for both Prometheus and the SAS Container Runtime pods. In this post we will look at the metrics in more detail and the Grafana dashboard details. I will also share a sample dashboard that you can download.

For the purposes of this post when I say “model” I’m talking about any model or decision image that has been published.

When creating a Grafana dashboard you should consider the following:

Who is the intended audience for the dashboard, and
What is the purpose or objective of the dashboard?
For example, is it for: general status monitoring, performance monitoring, system availability or problem determination?

You want to avoid the “kitchen sink” dashboard. This is a dashboard that has a lot of unrelated data (visualisation panels) and is a mashup of all possible types of data.

Understanding the target audience for the dashboard and the intended purpose will help you focus on the relevant information.

It is also important to understand the metrics data that is available to report on.

Understanding the metrics data

As described in Part 1, the metrics data is discussed in in the SAS Container Runtime Help Center documentation, see: Monitoring SAS Container Runtime Metrics

While this lists the core metrics values, the values returned can vary depending on the published model. Therefore, the best approach to get a full list of the metrics values is to call the metrics endpoint.

You can use the following command to get the output from the endpoint:

kubectl -n namespace exec scr_pod_name -- curl http://localhost:8080/prometheus

Therefore, the best approach to inspect the metrics values is to use this command after a few transactions (calls to the model) have been completed.

Remember, you need to set the SAS_SCR_METRICS environment variable to expose the execution metrics data.

SCR_TOTAL_HIT_COUNT vs SCR_MODULE_EXECUTION_COUNT

These two metrics seem very similar, they sound like they would deliver the same result, but they will return slightly different results.

The scr_total_hit_count returns a count for any hit on the endpoint, not just calls to run the model. Whereas the scr_module_execution_count returns a count of calls to run the model.

Hence, over time the “total hit count” can diverge from the “execution count” and show a higher value. Therefore, for more accurate reporting on model execution the scr_module_execution_count metrics should be used.

Histogram buckets

The histogram buckets are more for model tuning; you are more likely to look at this data when training a model. Therefore, it is probably less relevant to monitoring the production running models.

If you do plan to use the histogram data, there are several environment variables that can be set to control the input and output values. For example: SAS_SCR_INPUT_SCORE_MAX, SAS_SCR_OUTPUT_SCORE_MAX and SAS_SCR_SCORE_BINS

The SAS_SCR_INPUT_SCORE_MAX specifies the maximum value to display in the visualization histogram for the scoring input variable. The value that you assign should correlate to the input variable values. Likewise, the SAS_SCR_OUTPUT_SCORE_MAX specifies the maximum value to display in the visualization histogram for the scoring output variable.

SAS_SCR_SCORE_BINS specifies the number of bins (or buckets) that appear in a visualization histogram.

It is important to check the latest documentation for a list of the metrics environment variables. See the SAS Container Runtime Help Center: Monitoring SAS Container Runtime Metrics

Grafana dashboard example (basics)

To add some context to the sample dashboard and this discussion, I wanted to create a dashboard that is not tied to the Kubernetes pod and node names. As described in Part 1, I achieved this using Kubernetes labels. My hope is that this makes the sample dashboard portable with minimal environment updates.

To recap, the pods were given the following labels:

app.kubernetes.io/component
this is use as the top-level label to select all the model pods.
app.kubernets.io/name
This is used for the model name.

In my environment I had nodes dedicated to running the SAS Container Runtime pods. These nodes had the label: workload/class=models

Even if you aren’t dedicating node to the model pods, applying a unique label to the nodes can be used to drive node affinity and pod AntiAffinity. See the Kubernetes documentation for more information.

It also meant that in the Grafana dashboard I was able to create a filtered list of nodes based on the labels.

Making use of Grafana variables

To make the dashboard more interactive and dynamic you can use variables. As described above, these are used to create filtered lists, or to select a namespace or pod. In the sample dashboard I created the following variables:

Namespace ($namespace)
Node ($node): used to select a K8s node (filtered to the models node pool)
Model ($model): used to select a model by name
Model Instance ($application): used to select a specific model pod for the selected model.

For the namespace variable I used the app.kubernetes.io/component label to filter the list of namespaces, just to show the namespaces that are running the models. This configuration is shown in the following image.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Visualisations

A Grafana dashboard is comprised of one or more panels. A panel is the basic building block and is composed of a query and a visualization. The visualization is a graphical representation of the query results.

The nice thing about Grafana is there is a wealth of visualisations that can be used. This can be a good thing and a bad thing. There are many options for the type of visualisation, threshold mappings and colours, field overrides and much more. When new to Grafana dashboard development it can take some time to explore and settle on the right options. 😊

As all the metrics data is stored in a Prometheus database, the promQL query language is used. For more information on PromQL see the following link to the Prometheus documentation: Query basics

Before discussing some examples, let’s see the overview, specifically the “Running SCR Pods” row in the dashboard. Within the dashboard, a row is a mechanism for grouping objects.

Here are some examples from the sample dashboard.

The “Running Models” object is using a Stat gauge. I used this gauge to count the number of distinct running models across all namespaces. This does complicate things as the query can produce containers (results) with duplicate matching labels. To fix this the “max by (pod, namespace)” condition is required.

This visualization used the following query:

count(
  count by (label_app_kubernetes_io_name) (
    kube_pod_labels{
      label_app_kubernetes_io_component="scr"
    }
    * on (pod, namespace) group_left()
    max by (pod, namespace) (
      kube_pod_container_status_ready == 1
    )
  )
)

As highlighted above, there can be multiple ways to get the same information. In the example above you can see that I used the kube_pod_container_status_ready metric. I could have also used “kube_pod_status_phase{phase="Running"}”.

The kube_pod_status_phase shows where a pod is in the scheduling lifecycle from the control plane’s perspective, while kube_pod_container_status_ready indicates whether the application itself is ready.

Looking at the “Transactions per Minute” time series object, this uses “sum by model” using the scr_module_execution_count metric. It is using the following query:

sum by (model) (rate(scr_module_execution_count[$__rate_interval]) * 30)

This also illustrates the need to understand the metrics data and things like the sample rate. The rate returned was a per second value, but I found that the rate calculated was x2.

This can be caused by several factors, including the scrape interval, whether an HA duplicate series is being returned, the metric could exist with multiple labels (for example, status or phase). After experimenting with a few options, I found the easiest fix was to use x 30 to get the tpm rate (possibly not a production quality fix).

You can see the resulting time series graph above where the results are summed by the running model types, not by the individual pods. So, we end up with a count for: homeloan, qsreg1, qstree1 and riskscore.

Again, this query hasn’t been limited to a single namespace.

To drill into the calls to the models, the next image shows that it is possible to graph the results for all models and/or to drill into a specific model and pod. In this case the homeloan model.

Total Calls by Model / Decision: The table on the top left shows a current count of the model executions (2,639) and on the top right you can see a time series view of the metrics data. Thus, allowing model execution to be tracked over time.

At the very top of the screenshot, you can see that the “homeloan” model has been selected. I used a Kubernetes deployment to run this model with 3 pod replicas. You can see this in the bottom two visualisations. On the bottom left you can see a time series of the ‘Total Calls by Pod’. On the right you can see the ‘Execution Time by Pod’ for the homeloan model.

Being able to select a model and see the distribution of calls allows you to confirm that the calls to a model are being evenly distributed across all available pods. Seeing the execution time by pod can help identify any unexpected behaviours.

The Resource View row has visualisations to show the pod Quality of Service (QoS), CPU and memory quotas and usage, and the network bandwidth being used. The image is showing that it is possible to select multiple models for compression. In the blue box you can see that the qstree1, riskscore and qsreg1 models have been selected.

In the image you can see that 3 pods are running on node 1 and one of the qstree1 pods is running on node 0. The ‘qsreg1-scr-model’ did not have any resource requests and limits set so it has a BestEffort QoS. You can also see this in the CPU Quota table.

Looking at the next image, the homeloan model was selected. We can now see the memory being used for each homeloan pod, the ‘User CPU Time’ and the receive and transmit bandwidth being used.

Finally, let’s look at what is possible with the Java Virtual Machine (JVM) metrics. Monitoring the JVM metrics is important as the JVM processes are supporting the running of the models.

When we think about JVM monitoring there are several metrics to consider, to understand if the system/application is under pressure. After some research the top four would be:

Memory and Heap health
Garbage collection (GC)
CPU usage
Threads

These are also important when identifying JVM Out-Of-Memory (OOM) conditions. JVM OOM can be caused by several conditions, including java heap exhaustion, thread exhaustion, garbage collection overhead (spending too much time in GC) and native memory (non-heap) OOM. The result of this might be a pod that continues running, but becomes a zombie process that cannot process requests.

Memory pressure can lead to JVM failure. Therefore, it is important to monitor heap and non-heap memory. For the heap usage you need to look at space used by type (Eden, Survivor and Old or Tenured).

Eden space is where all new objects are initially allocated. It is typically collected using minor garbage collection (Minor GC). Survivor Spaces (S0 and S1): this space is used to hold objects that survived at least one GC. The Old / Tenured space is for long-lived objects.

The garbage collection can impact the responsiveness of a system. Therefore, it is important to understand performance. The garbage collection overhead is important to understand for any transaction where latency is important.

For the threads the key states are: Runnable, Blocked and Waiting. The health is a characteristic of the total threads, runnable vs blocked vs waiting.

The sample dashboard has a bit of a mash-up of the JVM metrics. I know I warned about creating a dashboard that is “unfocused”, the “kitchen sink anti-patten”, but I wanted to show what is possible. 😊

The following image shows some of the possible JVM visualisations.

Looking at the ‘JVM Thread States’ timeseries visualisation, it is using the following promQL query:

jvm_threads_state{model="$model", pod="$application"}

Here you can see that two variables are used. The legend that is shown is using the state ( {{state}} ) that is returned from the query.

Some of the other visualisations combine multiple queries. For example, ‘JVM Heap’ also uses a time series, but with 3 queries:

sum(jvm_memory_init_bytes{pod="$application", area="heap"})
sum(jvm_memory_committed_bytes{pod="$application", area="heap"})
sum(jvm_memory_max_bytes{pod="$application", area="heap"})

Final thoughts

From this overview you can see that the dashboard uses a mixture of the standard Kubernetes metrics and the SAS Container Runtime metrics.

While the dashboard endeavours to make the reporting independent of the pod names, there are queries that rely on regular expression matching using: pod=~".*$model.*"

This was necessary as not all metrics data includes the Kubernetes labels. However, it does require that the model name is part of the pod name. For example, qsreg1-scr-model or riskscore-model.

When I started working on creating the Grafana dashboard I was initially using a very old model image. I found that this did not return the JVM metrics data. This is an example of why it is important to keep your model images up to date.

Here is a link to the example dashboard: sassoftware/sas-education

While this was created for the SAS Container Runtime workshop, I hope it helps with starting your observability monitoring for the Container Runtime models. Remember this is just an example and will most likely need to be updated to meet your needs and run in your environment.

For my testing I was working with images created from the SAS Model Manager Quick Start Tutorial, that were published using SAS Viya LTS 2025.09.

If you would like to learn more about SAS Container Runtime and monitoring see the workshop in learn.sas.com. See the workshop: SAS Container Runtime: Architecture and Deployment on Azure Cloud A new exercise using the example dashboard will be available with the LTS 2026.03 workshop update.

Find more articles from SAS Global Enablement and Learning here.