There is no list of metrics exposed by the components of SAS Viya 4. However, because SAS Viya has been instrumented using the Prometheus model, it is fairly straightforward to collect this information on your own. And, since the Prometheus format is self-documentating the results will include information that helps to clarify what numbers are being returned.
The Prometheus metric model makes metrics available via an HTTP endpoint on each pod. So, the first step is to identify that endpoint. SAS Viya makes that easier by including the needed information in Kubernetes annotations on the pod. In the example below, we've used the kubectl describe pod command to list the relevant annotations for one of the SAS Viya pods.
$ kubectl -n viya describe pod sas-compute-c6d4dbfbc-tkx5k |grep prometheus
prometheus.io/path: /internal/metrics
prometheus.io/port: 8080
prometheus.io/scheme: https
prometheus.io/scrape: true
From that information, we can construct the URL needed to scrape the metrics from the pod. However, since that endpoint will only be available from within the Kubernetes cluster itself, we'll need to use either Kubernetes port-forwarding to make it externally accessible or access it from inside the cluster in some way. I think the kubectl exec command makes the latter approach fairly easy. Using the information obtained by the above command, we can construct the following command which runs a curl command inside the pod itself to scrape the metrics endpoint:
kubectl -n viya exec sas-compute-c6d4dbfbc-tkx5k -- curl https://localhost:8080/internal/metrics .
And here's an excerpt of what we get back when we do that:
$ kubectl -n viya exec sas-compute-c6d4dbfbc-tkx5k -- curl https://localhost:8080/internal/metrics
# HELP go_cgo_go_to_c_calls_calls_total Count of calls made from Go to C by the current process.
# TYPE go_cgo_go_to_c_calls_calls_total counter
go_cgo_go_to_c_calls_calls_total 1.9507996e+07
# HELP go_cpu_classes_gc_mark_assist_cpu_seconds_total Estimated total CPU time goroutines spent performing GC tasks to assist the GC and prevent it from falling behind the application. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_gc_mark_assist_cpu_seconds_total counter
go_cpu_classes_gc_mark_assist_cpu_seconds_total 0.27903992
# HELP go_cpu_classes_gc_mark_dedicated_cpu_seconds_total Estimated total CPU time spent performing GC tasks on processors (as defined by GOMAXPROCS) dedicated to those tasks. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_gc_mark_dedicated_cpu_seconds_total counter
go_cpu_classes_gc_mark_dedicated_cpu_seconds_total 7.690484447
This output excerpt shows 3 metrics (there are quite a few more). You can see the self-documenting features I mentioned: each metric is preceded by a #HELP and a #TYPE line. The HELP line provides the name of the metric (e.g. go_cpu_classes_gc_mark_assist_cpu_seconds_total) and a short text description (e.g. Estimated total CPU time goroutines spent performing GC tasks to assist the GC and prevent it from falling behind the application. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.) of the metric. The TYPE line repeats the metric name and identifies the type of metric: a counter metric is a number that only moves in one direction (something like elapsed time, for example) while a gauge metric is a number that can move up or down (e.g. something like CPU usage). Refer to the Prometheus documentation for a complete list of metric types. The third line returned (that does not start with a #) is the actual metric value. I should mention that the set of metrics returned by a component may change over time or between releases.
Greg Smith
Principal Software Developer
SAS