Azure spot instances are a cost-effective option provided by Microsoft Azure for running virtual machines (VMs) and workloads. Under traditional Azure virtual machine pricing, you pay a fixed rate. Spot instances allow you to take advantage of spare capacity in Azure’s data centers at significantly reduced prices. However, there’s a trade-off: Azure can reclaim the spot instance if the capacity is needed by paying customers. This therefore gives lower-cost compute resources but with the possibility of interruption. Wouldn’t it be great if we could use spot instances to run low-priority SAS Viya workloads? The key is to find workloads that can be restarted after an interruption without causing a problem.
The idea
I wanted to test the idea of creating a “spot host type” within SAS Workload Management deployed on an Azure AKS service. This new host type would be required by a “spot” queue. If workload is submitted to the spot queue, Azure spot instances should be launched via Azure autoscaling. On Kubernetes, these Azure spot instances would have to be Kubernetes nodes running within an Azure spot node pool. We therefore start by creating an Azure spot node pool on the Azure Kubernetes service where we deployed SAS Viya.
Azure Spot Instances offer substantial cost savings, but they are not suitable for all workloads. Those that require uninterrupted, constant resources should be run on regular on-demand instances to avoid potential interruptions. When using Azure Spot Instances with SAS Viya, careful workload planning, job monitoring, and resource management are essential to maximize the benefits of cost savings and scalability, but manage the risk of interruption
Azure Spot node pool
How to add an Azure Spot node pool is explained here. I used the command:
az aks nodepool add --resource-group hackinsas-viya202307-rg --cluster-name hackinsas-viya202307-aks -s Standard_L8s_v2 --name computespot8
--priority Spot --eviction-policy Delete --spot-max-price -1
--enable-cluster-autoscaler --min-count 0 --max-count 1 -c 1
--labels wlm=spot k8s\.azure\.com\/aks-local-ssd=true workload\.sas\.com\/class=compute launcher\.sas\.com\/prepullImage=sas-programming-environment
--node-taints=workload\.sas\.com\/class=compute:NoSchedule --no-wait
If you set the max price to -1, the instance won't be evicted based on price. As long as there's capacity and quota available, the price for the instance will be the lower price of either the current price for a Spot instance or for a standard instance.The minimal node count is set to 0. That means that initially, you see one node created. However, shortly after you launch this command, the nodes will probably scale to 0. That might sound surprising, especially if you were expecting SAS compute pods to use this node to schedule workloads. The node is correctly labeled with “workload.sas.com/class=compute”, so in theory SAS compute pods could land here. However, spot instances have an additional label that stops scheduling: “kubernetes.azure.com/scalesetpriority=spot”. What do you need to do to be able to submit a SAS batch program to run on these instances?
Installing Kyverno
We can use a tool called Kyverno to add the required toleration (“kubernetes.azure.com/scalesetpriority=spot”) to the SAS pods. Kyverno enhances Kubernetes by providing a policy-as-code framework that helps you enforce and manage various aspects of your workloads and resources in a Kubernetes cluster. Kyverno can be installed with these instructions. You can choose to install Kyverno with Helm or just download and apply an install.yaml:
kubectl create -f https://github.com/kyverno/kyverno/releases/download/v1.10.0/install.yaml
Adapting the SAS Workload Orchestrator and the SAS Image Staging Configuration DaemonSets
We can use Kyverno to apply an extra toleration to the SAS Workload Orchestrator (SWO) Daemonset and to the SAS “prepull-ds” of the Staging Configuration process. An exampe yaml file below is available in the manifests folder. When pre-conditions are met, this code will mutate the selected resources. The mutation adds the missing toleration to two DaemonSets: the sas-workload-orchestrator DaemonSet and any DaemonSet starting with prepull-ds in its name.
kubectl apply -f ds-add-toleration-scalesetpriority.yaml -n <viya-namespace>
Once this toleration has been added, sas-workload-orchestrator pods will be able to run on the spot nodes that were labeled with “workload.sas.com/class=compute”. If the Kubernetes cluster starts any spot nodes in future, they will also automatically get sas-workload-orchestrator Daemonset pods.
We have therefore used Kyverno to mutate the SWO daemonset to give it the necessary “kubernetes.azure.com/scalesetpriority=spot” toleration to get past the taint. It can now chase the “workload.sas.com/class=compute” label that was applied when provisioning the spot nodes. The node is now electable as candidate to host SAS compute workloads, and will be displayed as such in the workload orchestrator in SAS Environment Manager.
Looking at the details of the “computespot” host, all expected labels are set:
We added a toleration to any DaemonSet starting with “prepull-ds-*” because we want to let the SAS Image Staging process start required pods on spot nodes. That SAS Image Staging process will use a daemonset at a given time interval to ensure that relevant images have been pulled to hosts. Relevant images mainly means the sas-programming container. This will only be useful if at least one spot instance will run before you schedule workload on it. In practice, if the Azure autoscaler balances from 0 to n nodes, you will probably start with 0 nodes. After you submit workloads intended to run on spot instances, you will probably have to wait before the sas-programming container is pulled on the node. This reminds us again that spot instances should only be used for low priority jobs, when waiting five minutes doesn’t matter.
Adapting the SAS Workload Manager scaling pod.
SAS Workload Orchestrator can integrate with the Kubernetes Cluster Autoscaler using this documentation. This means that the SAS Workload Orchestrator can use the Kubernetes Cluster Autoscaler to launch new compute nodes. To do this, a sas-workload-orchestrator-scaling-pod will be scheduled on a compute node. That scaling pod should run on a compute node with the correct resource properties specified via the SAS Workload management queue. We therefore need to know how to create a “spot queue”. Jobs submitted via a “spot queue” will only run on Azure spot instances. If no spot nodes are available, they can be autoscaled via a sas-workload-orchestrator-scaling-pod. However, as before, the scaling pods requesting a specific node can’t run on our spot nodes. We know why by now: they do not tolerate the spot taint. We need to create an extra Kyverno cluster policy via pod-add-toleration.yaml that will add that toleration to any scaling-pod in our Kubernetes Viya namespace.
kubectl apply -f pod-add-toleration-scalesetpriority.yaml -n <viya-namespace>
Creating new sas-batch-pod-template-spot podtemplate
To make sure sas-batch-server pods hosting workloads meant for spot instances can run, we need to add the same extra scalesetpriority toleration needs to the podTemplate linked to the SAS Batch service launcher context. We need to create an extra sas-batch-pod-template for that. We can copy the original sas-batch-pod-template and modify the name and tolerations in the definition. An example yaml is available in the manifests folder.
kubectl apply -f sas-batch-pod-template-spot.yaml -n <viya-namespace>
Adding spot host type to SAS WLM
Creating a new SAS WLM host type is described in this documentation link. The host type created in our example here is named “spot”. In each host type, definition tags used to identify the hosts are specified. These tags can identify host characteristics that are relevant to SAS Workload Orchestrator. In the screenshot, the distinct tag used to identify Azure spot nodes is “wlm=spot”.
Adding spot queue to SAS WLM
Rather like we created host types, we can create a queue “spot”. You can assign different hosts or host types to each queue. Each queue definition includes a list of the host types and tags that identify the hosts allowed to process jobs from the queue. If the queue definition includes tags, then when the SAS Workload Orchestrator manager evaluates which host to use to process a job from the queue, it checks the tag values on the queue and host, and sends the job to a host with a matching tag.
By assigning the group of spot host type we created before, you can ensure that low-priority jobs will be processed on the Azure spot nodes.
Create new SAS Batch service launcher context spot
The next step is creating a new SAS Batch service launcher context spot. From in SAS Environment Manager click on the new launcher context icon:
Fill in the details for the new launcher context as shown in the screenshot.
Creating new spot Batch context
The final step is to create an extra Batch context. In the view where you defined the launcher context, switch to Batch contexts. Create a new one and name it ‘spot’. For the Launcher context, choose the one created in Step 7.
You can associate server contexts with SAS Workload Orchestrator queues. For information about associating a queue, see Contexts Page. Don’t forget to choose the dedicated “spot” queue as your SAS Workload Orchestrator queue.
Time for action
As a test, you can launch workloads via a simple test script. Here is an example script that you can run from in a Linux session. You can submit any program, just make use of the parameter “c” that allows you to provide the context, which should be specified as “spot”. This will transfer the workload to a “spot” queue that is supposed to run only on Azure Spot Instances.
export SSL_CERT_FILE=/<ANY_DIR>/trustedcerts.pem
INGRESS_URL=https://viya.sas.com
/home/ubuntu/sas-viya --profile default profile set-endpoint "${INGRESS_URL}"
/home/ubuntu/sas-viya --profile default profile toggle-color off
/home/ubuntu/sas-viya --profile default profile set-output text
/home/ubuntu/sas-viya --profile default auth login -u <user> -p <password>
for i in {1..50}
do
echo "Welcome $i times"
/home/ubuntu/sas-viya --profile default batch jobs submit-pgm -c spot --pgm prog1_hmeq.sas --restart-job
/home/ubuntu/sas-viya --profile default batch jobs submit-pgm -c spot --pgm prog2_baseball.sas --restart-job
/home/ubuntu/sas-viya --profile default batch jobs submit-pgm -c spot --pgm prog2_baseball.sas --restart-job
done
Before we kick off this shell script, we see four compute hosts. Looking closer, one is inactive. This is a spot compute host that is no longer part of the Kubernetes service. This may have been scaled down, possibly because no workload was submitted.
As soon the sas-workload orchestrator re-scans the Kubernetes services, inactive nodes are no longer displayed:
Lets launch the shell script and see what happens. As soon the script is started, some simple SAS programs are submitted in batch:
If we correctly set up the “spot” context, you will see the number of pending jobs rapidly growing in the “spot” queue:
When there is no node available with the right criteria (meaning a Kubernetes node with a label wlm=spot), SAS Workload Management will launch a sas-workload-orchestrator-scaling pod and a Kubernetes node with the right criteria to schedule that pod.
The new node has the right criteria (it is tainted and labeled with “workload.sas.com/class=compute”) and SAS Workload Management will therefore recognize it as compute host. The sas-workload-orchestrator DaemonSet will launch a sas-workload-orchestrator pod on the Azure spot node and then the process will be confirmed, with the host displayed in the orchestration dashboard in SAS Environment Manager. Note status is OPEN-FULL, which is correct: immediately after starting the node, it will accept the maximum number of jobs permitted to start running.
However before the first programs can run, the sas-programming container image needs to be pulled onto the new compute node. This was explained and configured in Step 2, and in theory is done by the SAS Image staging process. However, there has not yet been time for that process.
We can confirm shortly after the first sas-batch-servers are scheduled that a prepull-ds pod is running on the aks-computespot node. The time cycle when these prepull pods are running can be configured.
Five minutes after the first SAS batch jobs were scheduled, they were running on the new compute nodes. Within those five minutes, the nodes were launched and all required pods were able to run on it. For a job that can “wait” and doesn’t require the highest priority, I think that’s a good result. This is especially true once you realize that these compute nodes can have discounts of up to 90 percent compared to pay-as-you-go prices.
The WLM dashboard confirms that the system is configured to run at most five jobs. All other jobs will stay in a pending state until a slot becomes available.
You can follow up the status of the submitted workload. This is important because programs can be interrupted if the spot instance is claimed back for a paying customer. You can automate this checking and also automatically restart the job if required.
When all goes well all your submitted workloads will be executed and all counters will be reset to 0 on the spot queue:
The autoscaler will then scale back the spot node pool to the minimum. In our example, that’s 0. We then go back to our start position, where the single aks-computespot in the WLM dashboard has a status of OPEN-INACTIVE.
Finally, if you check Kubernetes Nodes via kubectl or Lens, you will see confirmation that you have no aks-computespot hosts:
It means as well your batch job is completed.
Spot instances are a great way to get more benefit from the flexibility offered by your cloud vendor to reduce your overall cloud spent. While the above example is completely built for Azure, similar capabilities exist in both AWS and GCP. This post illustrates the concept, to make it operate in a more customized production environment additional testing and configuration might be needed. Unfortunately using Spot instances is not supported.
... View more