SAS Agentic AI – Deploy and Score a SAS Agent on Kubernetes

2 Likes

Welcome back to the SAS Agentic AI Accelerator series! In the previous posts, we deployed Large Language Models (LLMs) as secure, scalable services on Azure, using containers, managed apps, and Kubernetes. We then built agentic AI workflows.

Now it’s time to deploy something smarter.

In this post, we’ll take a SAS agent, an agentic AI workflow built in SAS Intelligent Decisioning, publish it as a container image to Azure Container Registry (ACR), deploy it to Kubernetes, and score it through a secure HTTPS endpoint.

At the end of the post, you will have a SAS agent running in production.

Where You Are In The Series

In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR).
In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure.
In Part 3.1, SAS Agentic AI – Deploy and Score Models – Containers, we got our hands dirty deploying Azure Container Instances.
In Part 3.2, SAS Agentic AI – Deploy and Score Models – Apps, we discovered Azure Container Apps and Web Apps for scalable, secure LLM deployments.
Part 4 – Build Agentic AI Workflows in SAS Intelligent Decisioning.
You’re here: Part 5 – Deploy and Score a SAS Agent on Kubernetes.

Architecture Overview

Before you start deploying, let’s look at what’s actually running.

How the Pieces Fit Together:

SAS Intelligent Decisioning: Authors and governs the Agentic AI workflow (rules, models, LLM calls). Allows workflow publishing as a container image.
Azure Container Registry (ACR): Stores the published SAS agent as a container image.
Azure Kubernetes Service (AKS): Runs the SAS agent container in a pod. The example here is using AKS, but it can also be deployed in an on-prem Kubernetes cluster.
Ingress with TLS: Exposes the agent as a secure HTTPS REST endpoint.
SAS Container Runtime (SCR): Executes the agent workflow deployed to a pod and handles scoring requests.

From the outside, it’s just a REST API.

Inside, it’s a fully governed decisioning system.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Publish the SAS Agent to Azure

Suppose you developed the following agentic AI workflow that assess a loan request, formulates an approval / rejection message using LLM calls, assess the sentiment of the response and decides if a human should review the message:

Assumption: you have configured in SAS Viya a container publishing destination, such as Azure.

Start by publishing the Agentic AI workflow as a container image.

Steps in SAS Intelligent Decisioning:

Select Build Decisions to open SAS Intelligent Decisioning.
From the Decisions tab, open SAS_Agent.
Publish the decision to Azure as SAS_Agent1_0.

After publishing completes, a new container image appears in your Azure Container Registry.

At this point, the SAS agent logic lives inside the container image.

TLS Certificates

Assumption: TLS certificates are already configured as Kubernetes secrets.

This includes:

Creating TLS certificates.
Storing them as Kubernetes secrets.
Preparing the AKS cluster and node pools.
Configuring HTTPS ingress.

Rather than repeating those steps, we’ll treat them as pre-requisites.

Start here: SAS Agentic AI – Deploy and Score Models – Kubernetes, complete the steps from “TLS Certificates Briefly” through “Create Your Deployment YAML”.

Deployment Manifest / YAML

The deployment YAML:

Runs the SAS agent container from ACR.
Mounts TLS certificates into the pod.
Exposes ports for HTTP/HTTPS.
Routes traffic through a NGINX ingress.

The YAML is almost identical to the LLM deployment to Kubernetes. Differences:

The container image name and the container path changes.
The Service forwards traffic to the container’s 8443 port.
The Ingress tells NGINX to use HTTPS when talking to the backend.

Traffic remains encrypted end-to-end (Ingress → Service → Pod), eliminating the clear-text hop on 8080. This is considered more secure because sensitive scoring requests and responses are protected all the way to the container.

# Variables
RG=Resource_group
INGRESS_HOST=SAS_Viya_Ingress
echo $INGRESS_HOST
az login
ACR_NAME=Your_Azure_Container_Registry
# LLM image must be stored here as a container image
az acr login --name $ACR_NAME

## List ACR repositories or container images
az acr repository list --name $ACR_NAME --output table

## The variables are called LLM_, but they refer to the sas-agent.
## We're just reusing the template published in the previous post.
LLM=sas_agent1_0
LLMDASH=${LLM//_/-}
echo $LLM & echo $LLMDASH

# Create the deployment YAML file
cat > ~/project/deploy/models/${LLMDASH}-tls-deployment.yaml <<'EOF'
# ${LLMDASH} model deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: ${LLMDASH}
    workload/class: models
  name: ${LLMDASH}
spec:
  # modify replicas to support the requirements
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ${LLMDASH}
  template:
    metadata:
      labels:
        app: ${LLMDASH}
        app.kubernetes.io/name: ${LLMDASH}
        workload/class: models
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.azure.com/mode
                operator: NotIn
                values:
                - system
              - key: node.kubernetes.io/name
                operator: In
                values:
                - llm
      containers:
        - name: ${LLMDASH}
          image: ${ACR_NAME}.azurecr.io/${LLM}:latest
          imagePullPolicy: Always  # IfNotPresent or Always
          resources:
            requests:  # Minimum amount of resources requested
              cpu: 1
              memory: 8Gi
            limits:  # Maximum amount of resources requested
              cpu: 4
              memory: 16Gi
          ports:
            - containerPort: 8080
              name: http # Name the port "http"
            - containerPort: 8443
              name: https # Name the port "https"
          env:
          - name: SAS_SCR_SSL_ENABLED
            value: "true"
          - name: SAS_SCR_SSL_CERTIFICATE
            value: /secrets/tls.crt
          - name: SAS_SCR_SSL_KEY
            value: /secrets/tls.key
          - name: SAS_SCR_LOG_LEVEL_SCR_IO
            value: TRACE
          volumeMounts:
          - name: tls
            mountPath: /secrets
      volumes:
        - name: tls
          secret:
            secretName: scr-certificate
            items:  # Explicitly define the keys to mount
              - key: tls.crt
                path: tls.crt
              - key: tls.key
                path: tls.key
      tolerations:
      - key: workload/class
        operator: Equal
        value: models
        effect: NoSchedule
      - key: workload
        operator: Equal
        value: llm
        effect: NoSchedule
---
# TLS service definition
apiVersion: v1
kind: Service
metadata:
  name: ${LLMDASH}-tls-svc
  labels:
    app.kubernetes.io/name: ${LLMDASH}-tls-svc
spec:
  selector:
    app.kubernetes.io/name: ${LLMDASH}
    workload/class: models
  ports:
  - name: ${LLMDASH}-https
    port: 443
    protocol: TCP
    targetPort: 8443
  type: ClusterIP
---
# TLS ingress definition
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ${LLMDASH}-ingress
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
  labels:
    app.kubernetes.io/name: ${LLMDASH}-ingress
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - ${INGRESS_HOST}
    secretName: scr-certificate
  rules:
  - host: ${INGRESS_HOST}
    http:
      paths:
      - path: /${LLM}
        pathType: Prefix
        backend:
          service:
            name: ${LLMDASH}-tls-svc
            port:
              number: 443

EOF

Apply and Go!

Deploy your model:

# Deploy (apply)
kubectl apply -f ${LLMDASH}-tls-deployment.yaml -n llm
kubectl get pods -n llm -o wide
kubectl get pods -n ingress-nginx
kubectl get svc -n llm
kubectl get ingress -n llm
LLM_POD_NAME=$(kubectl get pods -n llm --no-headers | awk '$1 ~ /^sas-agent/ {print $1; exit}')

# Loop until the pod is Ready
while true; do
  STATUS=$(kubectl get pod $LLM_POD_NAME -n llm -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}')
  if [ "$STATUS" == "True" ]; then
    echo "Pod $LLM_POD_NAME is Ready."
    break
  else
    echo "Waiting for pod $LLM_POD_NAME to become Ready..."
    sleep 5
  fi
done

# Check logs when ready
kubectl logs $LLM_POD_NAME -n llm

Once the pod is ready, the SAS agent is live and ready to take scoring requests!

Score the SAS Agent

The SAS agent is now available as a secure HTTPS endpoint.

With everything live, you can send HTTPS requests to your Kubernetes ingress endpoint and watch your SAS agent produce its magic!

# Score the SAS Agent
echo $INGRESS_HOST, "https://${INGRESS_HOST}/${LLM}"
curl --location --request POST "https://${INGRESS_HOST}/${LLM}" --header 'Content-Type: application/json'  --header 'Accept: application/json' --data-raw '{"inputs":
        [
        {"name":"customer_id","value": 1012},
        {"name":"customer_name","value": "Robert Little"},
        {"name":"customer_language","value": "EN"},
        {"name":"BAD","value": 0},
        {"name":"LOAN","value": 2000},
        {"name":"CLAGE","value": 147.133},
        {"name":"CLNO","value": 9},
        {"name":"DEBTINC","value": 19},
        {"name":"DELINQ","value": 0},
        {"name":"DEROG","value": 0},
        {"name":"JOB","value": "Office"},
        {"name":"MORTDUE","value": 64536},
        {"name":"NINQ","value": 1},
        {"name":"REASON","value": "HomeImp"},
        {"name":"VALUE","value": 87400},
        {"name":"YOJ","value": 11},
        {"name":"high_value","value": 1}
        ]}' | jq

The response includes:

Decision outcomes.
LLM‑generated content.
Scores and classifications.
Execution metadata.

All from a single API call.

If you get a response, such as the following sample, congratulations! You’ve just deployed a secure, scalable SAS agent using Kubernetes.

Why This Matters

With this deployment, you’ve achieved something important:

A governed SAS agent that calls LLMs and makes decisions.
Packaged as a container.
Running on Kubernetes. Kubernetes provides enterprise‑grade scalability and security.
Exposed to secure HTTPS.
Ready for enterprise workloads.

This is the natural evolution from experimenting with LLMs to operationalizing Agentic AI.

Summary

SAS Viya handles the intelligence.

Kubernetes handles the scale and security.

You get a clean, auditable API that can power real business decisions.

In the world of Agentic AI, this is what production looks like.

Happy deploying!

Acknowledgment

Thanks to Michael Goddard for sharing his time and resources.

Additional Resources

SAS Agentic AI Accelerator (SAS Agentic AI Accelerator – GitHub public repository).
SAS Agentic AI Accelerator – Register and Publish Models.
SAS Agentic AI – Deploy and Score Models – The Big Picture.
SAS Agentic AI – Deploy and Score Models – Containers.
SAS Agentic AI – Deploy and Score Models – Apps.
SAS Agentic AI – Build Workflows in SAS Intelligent Decisioning.
SAS Container Runtime – SAS Documentation.

Want More Hands-On Guidance?

SAS offers a full workshop in the SAS Decisioning Learning Subscription with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure.

Access it on learn.sas.com. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.

For further guidance, reach out for assistance.

Find more articles from SAS Global Enablement and Learning here.