Alertmanager is the de-facto alerting solution that (along with Prometheus) forms part of the monitoring solution that is widely considered to be the industry standard. As highlighted in earlier posts, Alertmanager is deployed with SAS Viya Monitoring for Kubernetes and is primarily responsible for sending alerts when SAS Viya (and other) resources in the Kubernetes cluster aren't behaving as expected. When defined thresholds (based on metrics Prometheus has collected about things like resource consumption) are met, Alertmanager is notified, which is then tasked with sending notifications to appropriate channels. But Alertmanager isn't limited to working with Prometheus metrics. In fact, it has a REST API that can be used by any client to receive alerts and send notifications. In this post, we'll demonstrate one specific example; how to trigger alerts in Alertmanager when the SAS Viya license is approaching its expiration date.
At a high-level, this approach is made up of several components:
The sas-viya CLI has a licenses
plug-in which can display the expiry dates of each licensed product, along with warning and grace period dates. The following command extracts only the expiry date for Base SAS from the CLI command's output.
/opt/sas/viya/home/bin/sas-viya --output text licenses products list | awk '/^Base SAS/'| awk '{print $8}'
This command will be used in the shell script.
Remember that the CLI first requires a user token to be obtained, which can also be done in the script, thanks to the loginviauthinfo.py program from the pyviyatools project.
The script also needs to extract the current date and compare with the expiry date, which can be done by converting the two dates to seconds elapsed since UNIX epoch, and finding the difference. If the difference is less than 2,592,000 seconds (30 days), the alert condition is considered to have been met. To instruct Alertmanager to trigger the alert, a cURL command is used to send a REST API call containing the necessary alert attributes to Alertmanager. Note that there's also a v2 of the Alertmanager API, which has some added features and improvements that can make the example below more elegant.
tee /shared/gelcontent/batch/code/license-check.sh > /dev/null << EOF
#!/bin/bash
url='http://alertmanager.myserver.race.sas.com/api/v1/alerts'
thirtydays_s=2592000
# login to CLI
/pyviyatools/loginviauthinfo.py -f ~/.authinfo_sasboot
# store license expiry date in variable
echo "Checking license..."
licensedate=$(/opt/sas/viya/home/bin/sas-viya --output text licenses products list | awk '/^Base SAS/'| awk '{print $8}')
licensedate_s=$(date -d "$licensedate" "+%s")
echo "SAS license expiry date: " $licensedate
currdate=$(date +%F)
currdate_s=$(date -d "$currdate" "+%s")
echo "Today's date: " $currdate
diff_s=$(expr $licensedate_s - $currdate_s)
if [ "$diff_s" -le "$thirtydays_s" ]; then
alertname="SAS license expiring"
# send request
curl -XPOST $url -d "[{
\"status\": \"firing\",
\"labels\": {
\"alertname\": \"$alertname\",
\"severity\":\"warning\",
\"instance\": \"Gelcorp Dev environment\"
},
\"annotations\": {
\"summary\": \"Base SAS license expiring within 30 days.\",
\"runbook\": \"http://internal.gelcorp.com/wiki/alerts/license-expiry\"
},
\"generatorURL\": \"https://gelcorp.myserver.race.sas.com/SASEnvironmentManager/licenses\"
}]"
echo ""
fi
EOF
Ensure the script is executable by running:
chmod 755 /shared/gelcontent/batch/code/license-check.sh
The intention is to run this script inside a container on a daily schedule (at 2AM). To do so, a Kubernetes CronJob can be created as follows.
First, the Cronjob definition should be created in new YAML template. The definition should include a command to run the license-check.sh
script created earlier. The CLI command in the script uses loginviauthinfo.py
to authenticate, which needs access to the config file (config.json
) and authentication token (credentials.json
). These will also need to be copied to the container.
This particular environment has a dedicated namespace, "batchns", where this CronJob will run.
# set variables
export current_namespace=gelcorp
export SAS_CLI_PROFILE=${current_namespace}
export SSL_CERT_FILE=~/.certs/${current_namespace}_trustedcerts.pem
export REQUESTS_CA_BUNDLE=${SSL_CERT_FILE}
NODE1FQDN=$(hostname -f)
REGISTRY_NAME=myregistry.sas.com
IMAGE_TAG=${REGISTRY_NAME}/admin-toolkit/sasadmincli4:latest
echo ${NODE1FQDN}
echo ${IMAGE_TAG}
# change to deploy directory
cd ~/project/deploy/${current_namespace}/site-config/admincli
# copy necessary files
cp ~/.sas/config.json ~/project/deploy/${current_namespace}/site-config/admincli/
cp ~/.sas/credentials.json ~/project/deploy/${current_namespace}/site-config/admincli/
cp ~/.certs//${current_namespace}_trustedcerts.pem ~/project/deploy/${current_namespace}/site-config/admincli/trustedcerts.pem
# create yaml template
tee ~/project/deploy/${current_namespace}/site-config/admincli/cronlicensecheck.yaml > /dev/null << EOF
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cronlicensecheck
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: cronlicensecheck
image: ${IMAGE_TAG}
env:
- name: SAS_CLI_PROFILE
value: "${current_namespace}"
- name: SSL_CERT_FILE
value: /home/sas/.certs/trustedcerts.pem
- name: REQUESTS_CA_BUNDLE
value: /home/sas/.certs/trustedcerts.pem
command: ["/bin/sh"]
args: ["-c", "/gelcontent/batch/code/license-check.sh"]
volumeMounts:
- name: sas-viya-mycode-volume
mountPath: /gelcontent
- name: sas-cli-profile
mountPath: /home/sas/.sas/config.json
subPath: config.json
- name: secret-volume
mountPath: /home/sas/.sas/credentials.json
subPath: credentials.json
- name: cert-volume
mountPath: /home/sas/.certs/trustedcerts.pem
subPath: trustedcerts.pem
volumes:
- name: sas-viya-mycode-volume
nfs:
server: ${NODE1FQDN}
path: "/shared/gelcontent"
- name: sas-cli-profile
configMap:
name: cli-config
items:
- key: config.json
path: config.json
- name: secret-volume
configMap:
name: cli-token
items:
- key: credentials.json
path: credentials.json
- name: cert-volume
configMap:
name: cert-file
items:
- key: trustedcerts.pem
path: trustedcerts.pem
restartPolicy: Never
EOF
A kustomization.yaml file is also required to include the CronJob definition in the manifest that will be applied to the cluster.
tee ~/project/deploy/${current_namespace}/site-config/admincli/kustomization.yaml > /dev/null << EOF
---
namespace: batchns
generatorOptions:
disableNameSuffixHash: true
resources:
- cronlicensecheck.yaml
configMapGenerator:
- name: cli-config
files:
- config.json
- name: cli-token
files:
- credentials.json
- name: cert-file
files:
- trustedcerts.pem
EOF
The manifest can then be generated and applied.
cd ~/project/deploy/${current_namespace}/site-config/admincli
kustomize build -o ~/project/deploy/${current_namespace}/site-config/admincli/cronlicensecheck_job.yaml
kubectl -n batchns apply -f ~/project/deploy/${current_namespace}/site-config/admincli/cronlicensecheck_job.yaml
The CronJob will appear in the batchns
namespace:
kubectl get cronjob -n batchns
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronlicensecheck 0 2 * * * False 0 52m
The CronJob is scheduled to run once a day. Create a ad-hoc job from the CronJob to run it immdiately:
kubectl -n batchns create job --from=cronjob/cronlicensecheck license-check-adhoc
The job launches a pod to run the script. A peek at the pod log provides some details about what happened when it ran.
kubectl -n batchns logs license-check-adhoc-zvdkc
Checking license...
SAS license expiry date: 2021-10-28
Today's date: 2021-10-07
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 370 100 20 100 350 769 13461 --:--:-- --:--:-- --:--:-- 14230
{"status":"success"}
The "success" message is a result of the cURL command, and it indicates the request to trigger the alert was made successfully (i.e. the expiry date is less than 30 days away).
Verify by logging on to the Alertmanager UI. The alert will appear (with labels and attributes defined earlier) in the list of firing alerts.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Typically, with out-of-the-box metric alerts, clicking on the Source
button opens the Prometheus UI. For this alert, the Source
button instead opens SAS Environment Manager's Licenses page, as directed by the value of the generatorURL
property defined in the script's cURL command. Apart from that, the firing alert can be queried, grouped or silenced just like any other firing alert.
Note that alerts created using the v1 scheme of the Alertmanager API don't change state until explicitly instructed to. Consequently, an alert created using the process outlined here will need to be manually resolved. That can be done with a second cURL command that sets the value of the status
label on the alert to "resolved". A link to a script that runs the command (or documentation with resolution steps) could be added to the runbook
attribute of the alert, making it appear as a clickable link in the firing alert (as shown above).
The final piece of the puzzle is the routing, which determines how firing alerts are sent to the notification channels. Read my previous article to learn more about routing.
The example provided in this post can be used a starting point for alerting on other aspects of a Viya platform. Consider, for example, an alert that fires when services inside Viya pods become unavailable, or when critical data becomes inaccessible; administrators can benefit by adding these kinds of custom alerts to add to the collection of metric alerts that are included out-of-the-box with SAS Viya Monitoring for Kubernetes.
For additional information, please refer to the Alertmanager documentation page.
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.