Welcome back to the SAS Agentic AI Accelerator series! We’ve already cooked up LLM deployments with Docker and Azure’s managed services. Now, it’s time to turn up the heat with Kubernetes—the espresso machine of the cloud world. Sure, it has a few extra knobs and steam valves, but it gives you barista-level control.
If you crave fine-tuned control, serious scalability, and rock-solid HTTPS security, Kubernetes is your playground. Let’s roll up our sleeves and get an LLM running—with plenty of focus on keeping it secure and scalable. For simpler setups, Azure’s managed options work great, but for ultimate power and flexibility, Kubernetes is where magic happens!
Where We Are In The Series
In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR).
In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure.
In Part 3.1, SAS Agentic AI – Deploy and Score Models – Containers, we got our hands dirty deploying Azure Container Instances.
In Part 3.2, SAS Agentic AI – Deploy and Score Models – Apps, we discovered Azure Container Apps and Web Apps for scalable, secure LLM deployments.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
TLS Certificates Briefly
In our example, we’ll securely deploy an LLM (the open-source Qwen-25-05b LLM by Alibaba Cloud) behind an HTTPS endpoint on Kubernetes. Why HTTPS? Because you and your security officer will both sleep better at night.
You need a TLS certificate for HTTPS endpoints. Think of it as a VIP badge for secure web traffic. Here’s the concise version:
Generate a private key and certificate signing request (CSR).
Get the CSR signed by your internal or trusted certificate authority (CA).
Combine the certificate and full chain.
Load this into your Linux trust store (so tools like curl trust it).
Create a Kubernetes secret from the key and certificate.
# Set up secrets directory
secrets_dir=~/project/deploy/models/secrets
mkdir -p "$secrets_dir" && cd "$secrets_dir"
# Variables
RG=resource_group
INGRESS_SAN="${RG}.gelenable.sas.com" # SAS Viya URL or LLM deployment DNS GELEnvRootCA=my_folder # location of certificates and private key required for signing
# Generate private key and CSR
openssl req -newkey rsa:2048 -sha256 -nodes -keyout scr_key.pem -extensions v3_ca \
-config <(echo "[req]"; echo "distinguished_name=req"; echo "[v3_ca]"; \
echo "extendedKeyUsage=serverAuth"; \
echo "subjectAltName=DNS:${INGRESS_SAN}, DNS:*.${INGRESS_SAN}") \
-subj "/C=US/ST=NC/L=North Carolina/O=SAS/CN=${INGRESS_SAN}" \
-out scr_models.csr
# Sign CSR with Intermediate CA
# These options tell OpenSSL to use the Intermediate CA's certificate and private key to sign the new certificate, rather than creating a self-signed certificate.
echo "01" > scr_models.srl
openssl x509 -req -sha256 -extensions v3_ca \
-extfile <(echo "[v3_ca]"; echo "extendedKeyUsage=serverAuth"; \
echo "subjectAltName=DNS:${INGRESS_SAN}, DNS:*.${INGRESS_SAN}") \
-days 820 -in scr_models.csr \
-CA $GELEnvRootCA/intermediate.cert.pem \
-CAkey $GELEnvRootCA/intermediate.key.pem \
-CAserial scr_models.srl -out scr_cert.pem
# Append full certificate chain
cat $GELEnvRootCA/intermediate.cert.pem >> scr_cert.pem
cat $GELEnvRootCA/ca_cert.pem >> scr_cert.pem
# Remove temporary files
rm scr_models.*
# Optional: Review the certificate
openssl x509 -text -noout -in scr_cert.pem
# Trust the CA certificate system-wide (for cURL etc.)
sudo cp $GELEnvRootCA/ca_cert.pem /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust
The above block assumes you have access to intermediate CA's certificate and private key to sign the new certificate, rather than creating a self-signed certificate. For production, always use certificates signed by a trusted public Certificate Authority (CA), such as Let's Encrypt, DigiCert, or your organization's enterprise CA. This ensures secure, trusted, and verifiable connections for all clients.
That’s it, no need to get lost in a cryptographic jungle. I am simply reproducing a very reliable "TLS jungle trekking guide" produced by our SAS colleague, @MichaelGoddard. @StuartRogers is an authoritative source on TLS for SAS Viya and has plenty of trustworthy articles on SAS Communities
Prepare Your Kubernetes Cluster
Clean Up and Create a Namespace
Clear any coffee spills and set up a clean playground for your models:
kubectl delete ns llm
kubectl create ns llm
Add a Dedicated Node Pool
Large Language Models (LLMs) can be quite resource hungry. Open-source LLMs need lots of storage for model files, plus plenty of CPU and memory for processing. To keep everything running smoothly (and avoid stepping on other workloads’ toes), it’s best to give your LLMs their own dedicated node pool. Remember: choose the size of your node pool carefully, based on the specific LLMs you want to deploy and their technical requirements.
az aks nodepool add \
--resource-group $RG \
--cluster-name $AKS_NAME \
--name llmnp \
--node-count 1 \
--node-vm-size Standard_D16as_v5 \
--max-count 1 \
--min-count 0 \
--enable-cluster-autoscaler \
--node-taints workload=llm:NoSchedule \
--labels workload=llm node.kubernetes.io/name=llm workload/class=models
Check that your node is ready and properly labeled:
kubectl get nodes --show-labels
You should see labels like workload=llm and node.kubernetes.io/name=llm
Think of these node labels as 'Reserved for LLMs' parking spots.
Deploy the LLM to Your Kubernetes Cluster
Add Your TLS Secret
Load your certificate and key into Kubernetes as a secret:
kubectl -n llm create secret tls scr-certificate \
--key="scr_key.pem" \
--cert="scr_cert.pem"
# Check it’s there
kubectl -n llm get secrets
kubectl -n $NS describe secret scr-certificate
Create Your Deployment YAML
The Big Three: Pod, Service, and Ingress (What Do They Do?)
Pod
Think of a pod as the smallest shipping box in Kubernetes. Inside that box is your running application, in our case, the containerized LLM model. The pod wraps it up with the resources, environment variables, and storage it needs. If the pod isn’t running, your LLM isn’t either.
Service
A service is like the shipping label on the box. It makes sure traffic can find and reach your pod, even if the pod moves around, inside the cluster. In our YAML manifest, the service listens on port 443 (HTTPS) and forwards traffic to your LLM’s container, running inside the pod.
Ingress
Ingress is the front desk or receptionist of your Kubernetes office building. It’s the entry point for outside traffic. Ingress decides which service gets what request, handles HTTPS/TLS, and acts as a secure gateway from the internet to your application.
YAML
# Variables
RG=Resource_group
INGRESS_HOST=SAS_Viya_Ingress
echo $INGRESS_HOST
az login
ACR_NAME=Your_Azure_Container_Registry
# LLM image must be stored here as a container image
az acr login --name $ACR_NAME
# LLM name
LLM=qwen_25_05b
LLMDASH=${LLM//_/-}
echo $LLM & echo $LLMDASH
# Create the deployment YAML file
tee ~/project/deploy/models/${LLMDASH}-tls-deployment.yaml > /dev/null <<EOF
# ${LLMDASH} model deployment
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: ${LLMDASH}
workload/class: models
name: ${LLMDASH}
spec:
# modify replicas to support the requirements
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: ${LLMDASH}
template:
metadata:
labels:
app: ${LLMDASH}
app.kubernetes.io/name: ${LLMDASH}
workload/class: models
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.azure.com/mode
operator: NotIn
values:
- system
- key: node.kubernetes.io/name
operator: In
values:
- llm
containers:
- name: ${LLMDASH}
image: ${ACR_NAME}.azurecr.io/${LLM}:latest
imagePullPolicy: Always # IfNotPresent or Always
resources:
requests: # Minimum amount of resources requested
cpu: 1
memory: 8Gi
limits: # Maximum amount of resources requested
cpu: 4
memory: 16Gi
ports:
- containerPort: 8080
name: http # Name the port "http"
- containerPort: 8443
name: https # Name the port "https"
env:
- name: SAS_SCR_SSL_ENABLED
value: "true"
- name: SAS_SCR_SSL_CERTIFICATE
value: /secrets/tls.crt
- name: SAS_SCR_SSL_KEY
value: /secrets/tls.key
- name: SAS_SCR_LOG_LEVEL_SCR_IO
value: TRACE
volumeMounts:
- name: tls
mountPath: /secrets
volumes:
- name: tls
secret:
secretName: scr-certificate
items: # Explicitly define the keys to mount
- key: tls.crt
path: tls.crt
- key: tls.key
path: tls.key
tolerations:
- key: workload/class
operator: Equal
value: models
effect: NoSchedule
- key: workload
operator: Equal
value: llm
effect: NoSchedule
---
# TLS service definition
apiVersion: v1
kind: Service
metadata:
name: ${LLMDASH}-tls-svc
labels:
app.kubernetes.io/name: ${LLMDASH}-tls-svc
spec:
selector:
app.kubernetes.io/name: ${LLMDASH}
workload/class: models
ports:
- name: ${LLMDASH}-https
port: 443
protocol: TCP
targetPort: 8080
type: ClusterIP
---
# TLS ingress definition
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ${LLMDASH}-ingress
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTP
labels:
app.kubernetes.io/name: ${LLMDASH}-ingress
spec:
ingressClassName: nginx
tls:
- hosts:
- ${INGRESS_HOST}
secretName: scr-certificate
rules:
- host: ${INGRESS_HOST}
http:
paths:
- path: /${LLM}
pathType: Prefix
backend:
service:
name: ${LLMDASH}-tls-svc
port:
number: 443
EOF
What’s Happening in the YAML?
You’ll see three main sections in the YAML file. Here’s what each one does:
Deployment
Spins up your LLM as a container inside a pod.
Makes sure it runs on a special node reserved for LLMs (using labels and taints).
Mounts the TLS certificate and key (so your app can do HTTPS).
Sets environment variables to tell SAS Container Runtime (SCR) where to find the TLS certificates and how to behave.
Requests and limits resources (CPU and memory) so your LLM has enough “brainpower” to run but can’t block the whole cluster.
Service
Exposes your pod inside the cluster on port 443.
Acts as a stable “in-cluster” address for your LLM, so other components (like ingress) can always find it, even if pods are replaced or moved.
Ingress
Sets up a public HTTPS endpoint using your DNS name and TLS certificate.
Routes any incoming request like the service below which then sends it to your LLM pod.
https://your-dns/qwen_25_05b
Uses annotations to tell the NGINX ingress controller to expect HTTP traffic behind the scenes, even though users connect over HTTPS.
Apply and Go!
Deploy your model:
# Deploy
kubectl apply -f qwen-25-05b-tls-deployment.yaml -n llm
# Wait for the pod to be ready (watch for the “Ready” status):
kubectl get pods -n llm
# Check logs
kubectl logs -n llm
Score the LLM
With everything live, you can send HTTPS requests to your Kubernetes ingress endpoint and watch your LLM do its magic.
curl --location --request POST "https://${INGRESS_HOST}/qwen_25_05b" \
--header 'Content-Type: application/json' \
--header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \
--data-raw '{
"inputs": [
{"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"},
{"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with customer_name, loan_amount, customer_language. Follow the guidelines for a professional, friendly response."},
{"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"}
]
}' | jq
If you get a smart response, such as the following sample, congratulations! You’ve just deployed a secure, scalable LLM using Kubernetes.
Performance and Scaling Notes
Large Language Models (LLMs) are heavy weightlifters. They need generous CPU, memory, and storage, especially when running open-source versions. For best results, give LLMs their own dedicated node pool (or multiple node pools). This ensures your models won’t compete for resources with other workloads, keeping everything running smoothly.
When it comes to scaling, Kubernetes shines. You can adjust the number and size of nodes in your pool to match your workload. Just remember: the bigger the LLM, the beefier your node needs to be. Choose your node pool size based on the technical requirements of your models, don’t try to squeeze a heavyweight model into a tiny node!
For ultra-responsive performance, monitor CPU and memory usage and scale up as needed. And if you’re aiming for production-grade speed, keep an eye on response times as you adjust resources.
Security Corner
Security isn’t just an add-on—it’s essential. Always use HTTPS to protect data in transit. This means securing both your public endpoints and the internal traffic between your ingress, service, and pod. For extra peace of mind, forward traffic from the ingress to your pod over port 8443 (HTTPS), not just 8080 (HTTP).
Make sure:
Your container exposes containerPort: 8443 in the YAML.
Your ingress annotation is set to nginx.ingress.kubernetes.io/backend-protocol: HTTPS.
Certificates are properly managed, and secrets are stored securely in Kubernetes.
Recommendations
Always give your LLMs a dedicated node pool, sized appropriately for their needs. This avoids resource conflicts and keeps things running smoothly.
If it feels like your LLM is trying to eat your entire cluster, it probably is. Time to beef up those nodes.
Watch your CPU, memory, and response times. Scale up resources as needed and adjust node pool sizes to meet demand.
Use HTTPS end-to-end, not just at the edge. Enable secure backend communication by forwarding traffic on port 8443 and setting the right ingress annotations.
For anything beyond a quick test, secure your endpoints, manage certificates properly, and keep secrets safe.
Summary
Deploying LLMs in Kubernetes gives you flexibility, scalability, and strong security, if you set things up right. With these best practices in place, your LLMs will run smoothly, securely, and ready for whatever comes next.
And remember: in the world of Kubernetes, a little resource planning goes a long way. Happy deploying!
Thanks for following along! If you find this post helpful, give it a thumbs up, share your stories or questions in the comments, and let’s keep building better AI workflows together. Stay tuned for more!
Acknowledgment
Thanks to @MichaelGoddard for sharing his time and resources.
Additional Resources
SAS Agentic AI Accelerator – Register and Publish Models.
SAS Agentic AI – Deploy and Score Models – The Big Picture.
SAS Agentic AI – Deploy and Score Models – Containers.
SAS Agentic AI – Deploy and Score Models – Apps
SAS Container Runtime – SAS Documentation.
Want More Hands-On Guidance?
SAS offers a full workshop in the SAS Decisioning Learning Subscription with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure.
Access it on learn.sas.com. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.
For further guidance, reach out for assistance.
Find more articles from SAS Global Enablement and Learning here.
... View more