BookmarkSubscribeRSS Feed

SAS Viya – CAS Server Node Affinity And Toleration

Started ‎05-18-2023 by
Modified ‎05-18-2023 by
Views 618

In larger SAS Viya deployments or in SAS Viya multi-tenancy deployments, an administrator may decide to create multiple CAS servers to handle different types of workloads. Some workloads may need GPU processing, some may need a lot of CPU and memory, and some may only need a small amount of CPU and memory. One solution would be to run all CAS servers on Kubernetes nodes that have enough processing resources to satisfy the largest possible processing requests but that would be a waste of money for the load that only needed limited resources.


Ideally, the administrator could ensure that CAS servers will run on specific Kubernetes nodes that provide only the required processing resources. This implies that the administrator must require different kinds of Kubernetes nodes, one for each kind of workload, and create CAS servers that will use specific nods.

In this post, I will show you a new and easier way to start CAS server pods on a specific node pool...
 

About label, node pools, affinity, and toleration

 

This post is not about these Kubernetes concepts, but it is important that you know them. At a high level:

  • A label is key/value pairs that are attached to a Kubernetes object, used to specify identifying attributes of objects that are meaningful and relevant to users. It could be used to organize nodes
  • A node pool is a group of nodes within a Kubernetes cluster that all have the same configuration and node label.
  • Affinity and toleration are pod attributes that define on which nodes a pod starts in the Kubernetes cluster.
    • Affinity: To define the specific node pool on which the pod must run by default.
    • Toleration: To define if the pod must run only, or not, on a specific node pool.


Create a CAS server with specific node affinities and tolerations

 

To create a new CAS server, SAS provides you with a create-cas-server.sh script that is delivered to you into your SAS Viya deployment assets. You will find this script in the sas-bases/examples/cas/create/ directory. The create-cas-server.sh has different versions, like SAS Viya, and provides you with news options as SAS enhances it.

Since SAS Viya stable 2022.1.4, the create-cas-server.sh script provides us with options to manage the CAS server pods' affinity and toleration.

[myuser@myserver myviyadep]$ bash ./sas-bases/examples/cas/create/create-cas-server.sh --help
Flags:
  -h  --help     help
  -i, --instance CAS server instance name
  -o, --output   Output location. If undefined, default to working directory.
  -v, --version  CAS server creation utility version
  -w, --workers  Specify the number of CAS worker nodes. Default is 0 (SMP).
  -b, --backup   Set this to include a CAS backup controller. Disabled by default.
  -t, --tenant   Set the tenant name. default is shared.
  -r, --transfer Set this to enable support for state transfer between restarts. Disabled by default.
  -a, --affinity Specify the node affinity and toleration to use for this deployment.  Default is 'cas'.
  -q, --required-affinity Set this flag to have the node affinity be a required node affinity.  Default is preferred node affinity.

 

We have now two new options to directly manage the CAS servers' pods' node affinity and toleration when we generate their set of manifests.

  • -a, --affinity: to set the affinity. The parameter value will define on which node pool, Kubernetes nodes the new CAS server pods will have to be started by default.
  • -q, --required-affinity: to set if the new CAS server pods must start only on a specific node pool, Kubernetes nodes based on the label defined with the previous option.


Before SAS Viya stable 2022.1.4, as my colleague Raphaël Poumarede (@RPoumarede) described in his excellent post (Add a CAS “GPU-enabled” Node pool to boost your SAS Viya Analytics Platform !) you had to:

 

    1. Generate the new CAS Server manifests files using the create-cas-server.sh script
    2. Navigate into the new CAS Server manifests directory
    3. Search for the node-affinity.yaml file and edit it to modify the node affinity, and toleration


The issue with the previous versions of the create-cas-server.sh script is that they did not manage the CAS server pods' node affinity and toleration. You had to:


As of SAS Viya stable 2022.1.4, you can define the CAS server pod's node affinity and toleration when you create the CAS server using the new options.

 

You must define the affinity ("-a, --affinity" option) and decide the toleration ("-q, --required-affinity" option).

 

  • If the "-q, --required-affinity" option is not used, the CAS server pods will try first to start on the node pool, Kubernetes nodes that are labeled with the value you set with the "-a, --affinity" option, and then try to start on other node pools, Kubernetes nodes.
bash ~/project/deploy/mycasdep/sas-bases/examples/cas/create/create-cas-server.sh --instance newcassrv --output ~/project/deploy/mycasdep/site-config --affinity cassmall
  • If the "-q, --required-affinity" option is used, the CAS server pods will only try to start on the node pool, Kubernetes nodes that are labeled with the value you set with the "-a, --affinity" option. If impossible, the CAS server pods will never start.
bash ~/project/deploy/mycasdep/sas-bases/examples/cas/create/create-cas-server.sh --instance newcassrv --output ~/project/deploy/mycasdep/site-config --affinity casgpu --required-affinity
  • If you only use the "-q, --required-affinity" option, the CAS server pods will only try to start on the node pool, Kubernetes nodes that are labeled as "cas" (the Viya default value for CAS server pods node affinity). If impossible, the CAS server pods will never start.
bash ~/project/deploy/mycasdep/sas-bases/examples/cas/create/create-cas-server.sh --instance newcassrv --output ~/project/deploy/mycasdep/site-config --required-affinity


Note that most of the time, to be consistent, you will have to use the two options (e.g.: when you must use a node pool that has GPU).

 

The new version of the create-cas-server.sh script generates a new manifest: require-affinity.yaml.

 

[myuser@myserver myviyadep]$ ls -al ./site-config/cas-shared-newcassrv/
total 80
drwxrwxr-x 2 cloud-user cloud-user 4096 Oct 13 14:48 .
drwxr-xr-x 9 cloud-user cloud-user 4096 Oct 13 14:48 ..
-rw-rw-r-- 1 cloud-user cloud-user  203 Oct 13 14:48 annotations.yaml
-rw-rw-r-- 1 cloud-user cloud-user 3763 Oct 13 14:48 backup-agent-patch.yaml
-rw-rw-r-- 1 cloud-user cloud-user 2856 Oct 13 14:48 cas-consul-sidecar.yaml
-rw-rw-r-- 1 cloud-user cloud-user  359 Oct 13 14:48 cas-fsgroup-security-context.yaml
-rw-rw-r-- 1 cloud-user cloud-user 5635 Oct 13 14:48 cas-shared-newcassrv-cr.yaml
-rw-rw-r-- 1 cloud-user cloud-user 2282 Oct 13 14:48 cas-sssd-sidecar.yaml
-rw-rw-r-- 1 cloud-user cloud-user  263 Oct 13 14:48 configmaps.yaml
-rw-rw-r-- 1 cloud-user cloud-user  304 Oct 13 14:48 enable-binary-port.yaml
-rw-rw-r-- 1 cloud-user cloud-user  298 Oct 13 14:48 enable-http-port.yaml
-rw-rw-r-- 1 cloud-user cloud-user  304 Oct 13 14:48 kustomization.yaml
-rw-rw-r-- 1 cloud-user cloud-user 1267 Oct 13 14:48 kustomizeconfig.yaml
-rw-rw-r-- 1 cloud-user cloud-user 1353 Oct 13 14:48 node-affinity.yaml
-rw-rw-r-- 1 cloud-user cloud-user  293 Oct 13 14:48 provider-pvc.yaml
-rw-rw-r-- 1 cloud-user cloud-user  486 Oct 13 14:48 require-affinity.yaml
-rw-rw-r-- 1 cloud-user cloud-user  660 Oct 13 14:48 shared-newcassrv-pvc.yaml
-rw-rw-r-- 1 cloud-user cloud-user  433 Oct 13 14:48 state-transfer.yaml
-rw-rw-r-- 1 cloud-user cloud-user  400 Oct 13 14:48 transfer-pvc.yaml
[myuser@myserver myviyadep]$ 

 

The files listed above focus on three specific files that will be used to set the new CAS server pods’ node affinity and toleration.

 

They are all automatically generated by the create-cas-server.sh script whatever the options that you used.

  • node-affinity.yaml and require-affinity.yaml manifests are automatically generated using "cas" (default value), or the value you passed using the "-a, --affinity" option.

 

---
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: cas-node-affinity
patch: |-
  - op: add
    path: /spec/controllerTemplate/spec/affinity
    value:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
            - key: workload.sas.com/class
              operator: In
              values:
              - labelName
        - weight: 1
          preference:
            matchExpressions:
            - key: workload.sas.com/class
              operator: NotIn
              values:
              - compute
              - stateless
              - stateful
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: Kubernetes.azure.com/mode
              operator: NotIn
              values:
              - system
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: app.Kubernetes.io/name
                operator: In
                values:
                - sas-cas-server
            topologyKey: Kubernetes.io/hostname

target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1 
# PatchTransformer to make the labelName node label required
---
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: require-affinity-label
patch: |-
  - op: add
    path: /spec/controllerTemplate/spec/affinity/nodeAffinity/requiredDuringSchedulingIgnoredDuringExecution/nodeSelectorTerms/0/matchExpressions/-
    value:
      key: workload.sas.com/class
      operator: In
      values:
      - labelName
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1
  • The kustomization.yaml manifest is different depending on if you use or not the "-q, --required-affinity" option. It included or not the require-affinity.yaml manifests to modify the toleration.
resources:
- shared-newcassrvbig-pvc.yaml
- provider-pvc.yaml
- cas-shared-newcassrvbig-cr.yaml

generators:
- configmaps.yaml
configurations:
- kustomizeconfig.yaml
transformers:
- cas-fsgroup-security-context.yaml
- annotations.yaml
- backup-agent-patch.yaml
- cas-consul-sidecar.yaml
- node-affinity.yaml

- require-affinity.yaml # <-- Only if the "-q, --required-affinity" option is used


Changing the default CAS server pods node affinity and toleration

 

The question is why do you want to do that?

By default, CAS server pods will automatically start on a node pool that is labeled "cas" but are allowed to start on other node pools because the default toleration is permissive.

If you want to force the default CAS server pods to start only on the "cas" labeled node pool, you must create a manifest named cas-shared-default-require-affinity.yaml and reference it into the SAS Viya deployment kustomization.yaml manifest into its transformers field.

 

# PatchTransformer to make the cas node label required
---
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: require-affinity-label
patch: |-
  - op: add
    path: /spec/controllerTemplate/spec/affinity/nodeAffinity/requiredDuringSchedulingIgnoredDuringExecution/nodeSelectorTerms/0/matchExpressions/-
    value:
      key: workload.sas.com/class
      operator: In
      values:
      - cas
target:
  group: viya.sas.com
  kind: CASDeployment
  labelSelector: "sas.com/cas-server-default"
  version: v1alpha1

 

The labelSelector: "sas.com/cas-server-default" filtering value is to ensure that this new PatchTransformer manifest will be applied only against the cas-shared-default CAS server.

 

Last considerations

 

As Raphaël Poumarede  described in his post, defining a specific node pool, and CAS server pods' node affinity and toleration is sometimes not enough.


For sure the node pool has to be defined based on the architecture requirements documentation, but it could be required also to set some specific CAS server configurations (e.g.: to allow the CAS servers to use the GPU).

The create-cas-server.sh script allows us to create/manage the CAS servers, but not to configure them.

CAS server configuration tasks are not covered in this post. There are several ways to configure the CAS servers. Please refer to the SAS® Viya® documentation.

  

I hope this article has been helpful to you.

 

Special thanks to Raphaël Poumarede (@RPoumarede) from SAS Global Enablement and Learning.

 

References:

 

SAS documentation:

 

 

 

Relative SAS Global Enablement and Learning Posts:

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎05-18-2023 03:38 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags