BookmarkSubscribeRSS Feed

SAS Workload Management shines with Quality and Performance

Started yesterday by
Modified yesterday by
Views 70

SAS Workload Manager provides advanced workload orchestration to the SAS Viya platform. To deliver these capabilities, it makes intensive use of many platform and infrastructure services. This exposes it to a wide range of possible external failures and instabilities. Software releases throughout 2025 have focused on making it both faster and more stable, thanks to improved performance optimizations and better resilience. The result is a massive increase in software quality and user experience.

 

Let’s review the enhancements across different areas.

 

 

Performance Optimization

 

Improvements have targeted key pain points where heavy system use could lead to noticeable delays. By focusing on the core mechanisms of job processing and orchestration, bottlenecks under high load have been reduced.

 

Updates of job status have reduced from 75ms to 75μs (milliseconds to microseconds) – a 1,000x improvement! Previously, in a test environment with 1,000 concurrent jobs, the SAS Workload Orchestrator manager service would be occupied for 75 seconds when updating the job status (persisted in the SAS infrastructure Data Server). Currently, the same operation only takes 75 milliseconds in total, i.e. 75μs per job. This massive improvement was primarily achieved by optimizing the SQL code used for the updates, by creating new indexes for some database tables, and by moving certain activities to dedicated background threads.

 

Another area that has been improved is the time taken by SAS Workload Manager service to start up. Previously, this was impacted by the number of jobs that were still running when the services were last stopped. Recovering from the database the status of 100 jobs could take up to 30 seconds, and during that time the service was not responsive to client requests. Currently, in the same test environment, SAS Workload Management services are ready in about 6 seconds, independently of the number of jobs.

 

Additional optimizations include using separate threads to create pods in parallel, instead of serially, and reducing the timeout during the communication to launched pods to get resource information, so that processing is not blocked for too long in case of network issues.

 

 

Improved Database Maintenance

 

Starting with SAS Viya stable release 2024.08, the SAS Workload Orchestrator Manager service has been re-engineered to become stateless; this includes offloading state information into the SAS Infrastructure Data Server.

 

This change has increased the overall utilization of the underlying database server.

 

Starting with SAS Viya stable 2025.01, SAS Workload Management provides better database management, including the ability to delete old job records - only for completed jobs, never for active jobs. The records can be deleted manually by using the sas-viya CLI, or they can be deleted automatically based on parameters that SAS administrators can set in the SAS Workload Orchestrator configuration. By default, old jobs records are automatically deleted either after 60 days, or when the database tables exceed 100,000 records.

 

This optimization reduces the records in the history table of the database.

 

Even when migrating from older releases with many old records stored in the database tables, software optimizations prevent a single, massive deletion operation that could block service startup for a few minutes. In those cases, SAS Workload Orchestrator submits ‘delete’ commands in batches of a few thousand records, so that the initial cleanup is spread out across multiple hours, without overwhelming the system.

 

Manual job deletion and configuration for the parameters controlling automatic deletion can be controlled with the CLI. Starting with SAS Viya 2025.09, this capability has been added to the SAS Workload Orchestrator page in SAS Environment Manager.

 

The SQL optimization that we have already discussed in the performance section has the additional benefit of reducing the database table fragmentation, decreasing the amount of space that SAS Workload Management tables consume.

 

All these improvements do not remove the requirement to perform routine maintenance tasks for the PostgreSQL database: a healthy platform needs periodic maintenance to achieve optimum performance!

 

 

Quality and Resilience to External Issues

 

SAS Workload Orchestrator is closely integrated with various services, including event publishing through sas-arke and RabbitMQ, ongoing interactions with PostgreSQL, and frequent read/write operations to the Kubernetes API. As a result, SAS Workload Orchestrator is often among the first services affected during periods of environmental instability. Disabling SAS Workload Orchestrator may temporarily resolve certain issues (e.g., SAS Studio users can successfully connect to a backend session that otherwise could not be started), but this comes at the expense of losing the advanced functionalities that SAS Workload Management provides. In these scenarios, SAS Workload Orchestrator serves as an early indicator of broader system instability: it is impacted by these disruptions rather than causing them. The appropriate solution should be to address the underlying external service issues, rather than disabling SAS Workload Orchestrator.

 

To enhance its resilience, SAS Workload Orchestrator now incorporates additional checks and retry mechanisms to better manage and withstand instabilities in external services.

 

Examples:

 

  • If the PostgreSQL server is temporarily unresponsive (such as during a restart), SAS Workload Orchestrator maintains status updates in memory and commits them to the database once connectivity is restored.
  • Should the sas-arke service stop and restart, SAS Workload Orchestrator automatically establishes a new connection.
  • SAS Workload Orchestrator has improved handling for situations when the Kubernetes API is overloaded and slow to respond.
  • In cases where a SAS Workload Orchestrator manager fails over and another instance becomes primary, other system components now better accommodate this internal transition.

 

SAS Workload Orchestrator has also enhanced the user experience by providing more robust management of failures encountered when initiating submitted jobs. Now, if Kubernetes or the execution host fail to start a job (condition caught with the error code: HOST_FAILED), then SAS Workload Orchestrator automatically re-submits the job to a different node. Note that this is different from automatic requeuing of restartable jobs.

 

Finally, SAS Workload Orchestrator and the SAS launcher service include better retry logic to handle cases when called services return an error.

 

 

Better Documentation

 

SAS Workload Management documentation has been enhanced, too. Improvements include:

 

 

These enhancements are in addition to existing documentation, such has the SAS Workload Management page of the troubleshooting guide: https://go.documentation.sas.com/doc/en/sasadmincdc/default/calts/p04x9ud5y3wec3n1l8dxpwvtbdl2.htm

 

 

Conclusion

 

With these latest enhancements, SAS Workload Orchestrator continues to advance reliability, efficiency, and user experience for organizations leveraging the SAS Viya platform. These improvements are designed to ensure your operations run more smoothly and with greater transparency. For more details and hands-on guidance, be sure to explore the updated documentation linked above.

 

 

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
yesterday
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Labels
Article Tags