BookmarkSubscribeRSS Feed

Setting up Preemption in SAS Workload Management

Started ‎10-08-2023 by
Modified ‎10-08-2023 by
Views 1,871

In case you haven’t heard, SAS Workload Management is included by default as part of the SAS Viya platform beginning in SAS Viya 2023.08. A license for it is no longer required. This opens many features for managing SAS compute server sessions in Viya to everyone.  While it’s new to some, Workload Management has been around awhile.  There have been several posts related to it already.  

 

One of the items you can configure with Workload Orchestrator is queues.  Along with other components provided with Workload Management, queues help provide a way to manage compute server sessions across the nodes in your Viya environment. 

 

In this post I’m going to cover queue preemption. Preemption means stopping a job to allow another one, typically one considered to be higher priority, to run.

 

Maximum Jobs Allowed and Queue Priorities

 

Let’s start with an introduction to a couple of items you can configure with Workload Orchestrator that are important to this discussion.  Workload Orchestrator is an administrative user interface accessed from the SAS Environment Manager application menu and is used to manage the Workload Orchestrator environment. 

 

In every grid environment there is a limit on the number of jobs that can be running in the environment. Each node has a Maximum Jobs Allowed property that limits the number of jobs that can execute on that node. Every node is associated with a Host Type and Maximum Jobs Allowed is set on Host Types on the Configuration tab in Workload Orchestrator as you can see below.

 

The default for Maximum Jobs Allowed shown below is one job per core available on the node. "So, what’s with the -1 below?", you ask? The negative value here means to multiply the absolute value of the negative number by the number of cores. A -2 would means two jobs allowed per core and so on. A value of zero or above is a literal which means that value is the total number of jobs allowed on the host. The sum of the Maximum Jobs Allowed on all nodes is the total for the environment. It is important to recognize that the number of jobs that can run is a finite resource.

 

Edited_01_DRB_B2_pic2-1024x731.png

Select any image to see a larger version.

Mobile users: If you do not see this image, scroll to the bottom of the page and select the "Full" version of this post.

 

Every queue in a Workload Management environment has a priority that is an integer value and priority values among the queues are relative. Priorities are used to help determine which jobs are run first. Jobs in a higher priority queue are considered for scheduling before jobs from lower priority queues. You’ll find a queue’s priority on Queues on the Configuration tab in Workload Orchestrator. The default queue below has a priority of 10.

 

Edited_02_DRB_B2_pic3.png

 

Example Workload Management Configuration

 

Now that we understand Maximum Jobs Allowed and queue priority, let’s look at a scenario with the following attributes.

 

  • There are five nodes with eight cores each which means the total Maximum Jobs Allowed is 40.
  • The default queue’s priority is 10.
  • The priority queue’s priority is 20.
  • The urgent queue’s priority is 30.
  • Jobs from all queues can be scheduled to all nodes.

 

No other queue configurations have been defined at this time.

 

Over time there have been over 40 jobs submitted to the defined queues and scheduled for execution on our nodes.  The hosts are full and there is no more capacity for new jobs. You can monitor the hosts in the Hosts tab of Workload Orchestrator.

  Edited_03_DRB_B2_pic4-1024x263.png

 

Any additional jobs that are submitted are left pending in the queue they are submitted to. Even a job submitted to the highest priority queue, the urgent queue, will have to wait until space is available. You can monitor the queues in the Queues tab of Workload Orchestrator.

 

Edited_04_DRB_B2_pic5-1024x298.png

 

When the running jobs complete and space is once again available, jobs from the urgent queue are submitted for execution first. All jobs from the urgent queue are processed before any jobs from lower priority queues are considered. Then jobs from the priority queue are processed before those from the default queue.

 

The problem is jobs from the urgent queue are waiting to execute. Our urgent jobs are the most important and we need to ensure they are run immediately when they are submitted.

 

One Possible Solution

 

SAS Workload Management is very dynamic and flexible in its ability to distribute jobs across all nodes to ensure all jobs are processed in a timely manner. One possible solution for this scenario is to use queue preemption. This means that jobs considered more important from a higher priority queue can stop the jobs in the lower priority queues specified in the queue configuration.

 

It’s easy to implement in Workload Orchestrator. Simply add the queue or queues that can be preempted to the Preempts field on the higher priority queue’s configuration page. The urgent queue’s Preempts configuration might look like this.

 

Edited_05_DRB_B2_pic6.png

 

Now, when all possible slots in our environment for jobs are being used and a job is submitted to the urgent queue, a job from the default or priority queue will be stopped and the resources freed so the job from the urgent queue can run. The jobs that have been preempted will have a state of KILLED-PREEMPTION. In this Jobs monitoring view in Workload Orchestrator, you can see jobs on the default and priority queues that have been preempted by running jobs from the urgent queue. All jobs from the urgent queue will be processed before the pending job from the priority queue.

 

Edited_06_DRB_B2_pic7.png

 

Comparison with SAS 9 Grid Manager

 

For anyone coming from the SAS 9 Grid Manager environment things work differently in Workload Management in Viya. There are obvious differences between the SAS Viya and SAS 9 architectures. These lead to one major difference between the behavior of Workload Management and SAS Grid Manager. If you are or will be migrating from SAS 9 to Viya, this needs to be understood.

 

In SAS 9 the default behavior is that jobs that are preempted are suspended and will eventually be able to continue at the point they were stopped.  SAS Viya compute servers are Kubernetes pods so when a job is preempted, the pod is terminated.  The jobs aren't going to automatically resume where they were stopped.  There are programming and Viya configuration steps you can take to better simulate the SAS 9 behavior.   I'll cover that in an upcoming post. 

 

What's next?

 

As I said earlier, SAS Workload Management is very dynamic and flexible in its ability to distribute jobs across nodes to ensure all jobs are processed in a timely manner. Terminating jobs may not be what you had in mind.  As I just noted, jobs are terminated and not resumed by default and interactive SAS Studio users will lose their work.  Preemption is just one tool in the Workload Management toolbox to help manage things.  Look for future blogs with additional Workload Management configuration options. 

 

 

Find more articles from SAS Global Enablement and Learning here.

Comments

hello Darrell, thank you very much for this article. It is written in such a way that one can follow all the steps. I am looking forward to the next how to resume jobs from preemptions. One question: we have just added to kustomization.yaml cluster role bindings for SWO but still when looking at the jobs we do not see a name of the job like you have, " dowork10min", but we see the name of the pod. How can this be changed? If you could advise here or in the next blog, it would help a lot to see the name of the job especially for shared account jobs.

@touwen_k, thanks so much for the kind words.  I'm glad you found the blog useful.   I have changed roles at SAS and am no longer working on SAS Viya Workload Management.  I work exclusively with SAS 9 now.   

 

I can however answer one of your questions.  While there are some SAS Workload Management configuration items and system options that need to be set, the key to restarting/resuming the jobs in SAS Viya Workload Management is using programming techniques.  Specifically, the programs need to utilize checkpoint and restart.  

 

For the remainder of your questions and maybe a follow-up blog on the Workload Management configuration for resuming the jobs (that I did intend to write before the role change!!), I've forwarded your comments to my former colleagues that can hopefully provide details to resolve your other issues.  

Version history
Last update:
‎10-08-2023 02:44 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags