Introducing the SAS Container Runtime Batch Agent

2 Likes

With SAS Viya 2025.09 (September 2025) one of the enhancements is the introduction of the SAS Container Runtime Batch Agent. This is a significant update in the way you can deploy and run the published images for models and decisions.

The Batch Agent is aimed at the requirement to process large batches of data using a model or decision that has been published as a SAS Container Runtime container. It provides additional oversight, tracking, and control features that enhance reliability and operational transparency during batch execution.

This post will provide an overview and introduction to the Batch Agent.

I should start by saying that the SAS Container Runtime container images are Open Container Initiative (OCI) compliant images, so you have a lot of flexibility as to where and how you deploy and run the published images. The Batch Agent provides another option.

The Batch Agent supplements the SAS Container Runtime batch API to process multi-transaction payloads.

To dive straight into the details, the Batch Agent uses the Kubernetes Indexed Job pattern. The published Container Runtime image runs as a sidecar container to the Batch Agent. The following image provides an overview of the sidecar pattern applied to the Batch Agent implementation.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

This allows you to run the published model or decision as a batch job, rather than having the model running and calling it through the REST API. For example, as a pod or deployment running in the Kubernetes cluster.

With SAS Viya 2025.09 there are now two options to execute batch payloads:

SAS Container Runtime Batch REST API, and
SAS Container Runtime Batch Agent.

The Batch REST API allows single batch requests to be made to the SAS Container Runtime API. Whereas the Batch Agent coordinates the processing, allowing large batch input of data to be broken into manageable pieces and sent to the SAS Container Runtime Batch REST API for processing.

This approach provides a flexible framework to run the published images in batch. As described earlier, the Batch Agent provides oversight, tracking, and control features that enhance reliability and operational transparency during batch execution.

Two key features of using the Indexed Job pattern are:

Control over the number of instances of the Batch Agent and SAS Container Runtime containers to create and run in parallel.
The ability to define the SAS Container Runtime batch size (how many rows of data to send to the SAS Container Runtime container in one request).

The following diagram provides an overview of the process flow for running a published model or decision using the Batch Agent. It is showing the high-level steps that must be performed.

Supported Data Sources, Data Types and Databases

With the initial release of the Batch Agent the following data sources, types and databases are supported:

Google Cloud Storage: Blob storage
- This applies to both the input and output data.
Data formats: Comma Separated Values (CSV)
- This applies to both the input and output data.
Databases: Postgres
- A database is needed only if you want to track job processing. It is not required for the Batch Agent execution.

Notes:

The storage options may be enhanced in future releases, based on feature requests and SAS Product Suggestions from the user community.
The requirements above are for the Batch Agent, it is not referring to the SAS Container Runtime support.

Applying these requirements to the batch orchestration flow shown above means that Google Blob Storage must be used for steps A & E.

Given the above context, the following architecture overview diagram provides a more detailed view of the Batch Agent runtime.

Note, as stated above, the use of a job tracking database is optional. The use of the Prometheus Pushgateway and other monitoring tools is also optional.

Configuring and Running the Batch Agent

The batch agent is configured using a yaml document (manifest) to define the Indexed Job. Within the manifest you need to define the following:

The Batch Agent image name.
The environment variables required to configure the Batch Agent for this job instance.
The credentials to access the Google Storage.
The values (parameters and environment variables) required to run the SAS Container Runtime image.

A template example for the above is provided in the SAS Container Runtime Help Center. See: About the SAS Container Runtime Batch Agent

Once the manifest has been created, you start the batch job using the standard ‘kubectl apply’ command. This is step C in the orchestration flow shown above.

The Batch Agent image name can be obtained using SAS Mirror Manager (mirrormgr). The following command should be used (see the SAS documentation):

mirrormgr list remote docker tags --deployment-data order_certificates.zip --latest | grep sas-scr-batch-agent

The Kubernetes cluster where the Batch Agent is running must be able to pull the Batch Agent image either from the SAS registry (cr.sas.com) or from a private registry. The private registry is where the published (SAS Container Runtime) images are stored, it must also be accessible. This is shown in the architecture overview diagram.

This approach also means that the standard Kubernetes orchestration tools can be used to control the batch processing. For example, using a Git repository for the manifests and/or tools like Argo CD.

When the batch job is run, the ‘SAS_SCR_BATCH_JOBS’ environment variable is used to specify the number of concurrent jobs (batch execution pods) that are used to process the batch file. When setting the environment variable to a value greater than 1, you must also set the spec.completions and spec.parallelism fields in the Indexed Job definition to the same value.

When the ‘SAS_SCR_BATCH_JOBS’ environment variable is configured to a value greater than 1 it leads to multiple pods being started, each pod contains the Batch Agent and the SAS Container Runtime container for the model or decision being executed. This is illustrated in the following diagram. Here you can see “Model X” is being used.

When splitting the batch input and running multiple ‘batch execution pods’ there are two key environment variables:

SAS_SCR_BATCH_JOBS, and
SAS_SCR_BATCH_INPUT_FILE_SPLIT.

The SAS_SCR_BATCH_JOBS variable is used to specify the number of concurrent jobs that can be used to process the batch file. In the diagram you can see that this was set to ‘3’.

The SAS_SCR_BATCH_INPUT_FILE_SPLIT variable specifies whether the input file has been split into multiple files.

Finally, the SAS_SCR_BATCH_INPUT_FILE_NAME is a required variable, and as the name suggests, it is used to specify the input file name. You specify the file name without the file extension.

The first two variables interact to provide the following runtime options:

Option 1: One execution pod, with a single input file

The SAS_SCR_BATCH_JOBS is set to ‘1’ or not set (the default value is ‘1’) and ‘SAS_SCR_BATCH_INPUT_FILE_SPLIT=FALSE’. In this case a single pod runs and processes a single input file. The input file name is provided by the SAS_SCR_BATCH_INPUT_FILE_NAME variable.

Option 2: Multiple execution pods, with a single input file

In this case the SAS_SCR_BATCH_JOBS variable is set to a value greater than ‘1’ and ‘SAS_SCR_BATCH_INPUT_FILE_SPLIT=FALSE’. With this combination, the Batch Agent (in each batch execution pod) will automatically split the input data (as evenly as possible) and send an indexed split to the SAS Container Runtime container (the model or decision).

The recommended approach when using multiple execution pods is to manually split the input data into multiple files. As the Batch Agent doesn’t have the overhead of processing the input data split (and the user has control over the input data split). This is Option 3.

Option 3: Multiple execution pods, with multiple input files

The SAS_SCR_BATCH_JOBS variable is set to a value greater than ‘1’, and ‘SAS_SCR_BATCH_INPUT_FILE_SPLIT=TRUE’. In this scenario, the user has split the input data into multiple files. The number of input files must match the number set with SAS_SCR_BATCH_JOBS variable.

For example, using our diagram above, if there are 3 ‘batch execution pods’, the Batch Agent uses the job index number to select the input file for each batch execution pod. Assuming the input file name variable was set to ‘batch_input’, you would provide the following input files: batch_input0.csv, batch_input1.csv and batch_input2.csv

The job index numbering always starts at ‘0’.

The input and output files

In addition to the current requirement for Google Cloud Storage, there are further considerations regarding the input and output files.

The Batch Agent assumes the following:

All the column names in the assigned input file match the SAS Container Runtime input (request) variable names.
All the column names in the assigned output file match the SAS Container Runtime output (response) variable names.

If the input and output column names do not match the SAS Container Runtime variable names, it is possible to define a mapping file. The mapping file is in json format and maps the storage column names to the variable names.

Retry Functionality and Job Restart

To be able to cater for job processing errors, retry functionality is supported. When a connection request fails, the Batch Agent enters retry mode. By default, it attempts to connect three times, waiting ten seconds between each retry attempt. When a retry attempt succeeds, the batch job continues.

There are two environment variables to control the retry functionality. You can specify the maximum number of retry attempts (SAS_SCR_BATCH_MAX_RETRY_ATTEMPTS), and the delay between each retry attempt (SAS_SCR_BATCH_RETRY_DELAY).

To support job restart from a given point you must configure an external database. This is the “Job Tracking” database shown in the architecture overview diagram. When the “Job Tracking” database is configured, the Batch Agent restarts from where it left off, from the last know good point in the batch job.

Observability

Given the variable nature of batch processing, jobs can run in a matter of milliseconds or a job can run minutes or longer. Therefore, using a pull model for the job execution metrics will not work, in these cases a Pushgateway can be used. The Prometheus Pushgateway is an intermediary service which allows you to push metrics from jobs which cannot be scraped. This is shown in the architecture overview diagram above.

The Batch Agent uses the push model, using a specified Pushgateway URL. The URL is configured by using the SAS_SCR_BATCH_PROMETHEUS_PUSHGATEWAY_URL environment variable.

It is important to note the following when collecting job metrics:

The Batch Agent job must have a unique name.
All metrics are associated with only a single Batch Agent job run. When another job (with the same name and instance) starts, all metrics are reset.

There is a range of metrics variables available to collect information on job duration, read and write counts, as well as item skip counts for both input and output.

Once the metrics have been pushed to Prometheus, monitoring tools such as Grafana can be used. If the batch jobs are running in the same Kubernetes cluster as the SAS Viya platform and the SAS Viya Monitoring for Kubernetes framework has been deployed this can used for monitoring.

However, it is important to understand that it is a customer responsibility to develop any required Grafana dashboards.

In closing…

While the SAS Container Batch REST API was available prior to SAS Viya 2025.09, the Batch Agent extends and enriches the batch process capability.

Please refer to the SAS Container Runtime Help Center for the details, see: Executing and Managing Batch Jobs

This is the first of a planned series of posts looking at the Batch Agent.

Find more articles from SAS Global Enablement and Learning here.