Welcome back to the SAS Agentic AI Accelerator series! Today we’ll explore how to deploy and score code-wrapped Large Language Models (LLMs) in Azure, then call them from Agentic AI workflows inside SAS Viya.
To keep things clear, the topic is split into two parts:
In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how to publish them with the SAS Container Runtime (SCR). The end result was Docker images in a container registry.
Part 2 — this post — covers the deployment options.
After registering and publishing an LLM code wrapper, you can deploy it as a Docker image in various environments. Once deployed, you can score using the SAS Container Runtime API.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Here’s a quick overview of the deployment options in the Azure cloud. That’s not an exhaustive list. This list reflects only what I tested:
Deployment Option | Use Case | Scalability | Ease of Setup | Security |
Azure Container Instances |
Lightweight, quick starts | Low | Simple | Public or private IP (HTTP only) |
Azure Container Apps | Event-driven, auto-scaling | Medium | Managed | Public IP (HTTPS) |
Azure Web Apps | Managed containers hosting | Medium | Managed | Public IP (HTTPS) |
Kubernetes Pods | Large-scale, fully orchestrated | High | Complex (requires YAML, must manage node resources) | Flexible: Private / public IP (HTTP or HTTPS) |
Containers on Virtual Machines | Legacy or custom configurations | Medium | Moderate | Flexible (private/public) |
As you can see, there are so many options available in the Azure cloud. The following table should help you choose the one for you needs.
Deployment Option | Feasibility | Advantages | Limitations |
Azure Container Instances | Ideal for small open source LLMs like phi-3-mini-4k (4 vCPUs, 16 GB RAM). | Easiest to launch. | Limited to 4 CPUs, 16 GB RAM. |
Azure Container Apps | Middle ground between Container Instances and Kubernetes clusters. | Built-in HTTPS ingress and auto-scale. | Capped at 2 CPUs and 8 GB RAM; may cause out-of-memory errors. |
Azure Web Apps | Simple to deploy and scale. | Works for lighter workloads. Supports deployment slots. | Resource limits can bottleneck performance. Adding resources doesn’t always improve performance. |
Kubernetes Pods | Production-grade control over resources, scaling, and isolation. | Fine-grained control over resources, scaling, and isolation. | Requires Kubernetes skills (that’s what customers are always telling us). |
Containers on Virtual Machines | Highly flexible for legacy systems or custom configurations. | Complete control over CPU, RAM, and disk. | Higher cost and operational effort. |
Any deployment choice, in a cloud, has a cost.
To help you evaluate the cost of each deployment option, the table below summarizes typical daily costs based on Azure pricing estimates. These values may vary depending on region, container size, and configuration. The estimates assume low, infrequent traffic, a few requests per day from your Agentic AI workflow to your deployed LLM.
Deployment Option | Estimated Daily Cost | Details |
Azure Container Instances |
~$3–$10/day | Cost depends on CPU and memory allocation (e.g., 2 CPUs and 8 GB of memory). |
Azure Container Apps | ~$5–$12/day | Includes management costs, ingress, and scalability features. |
Azure Web Apps | ~$8–$15/day | Managed service costs include app hosting and container runtime fees. |
Kubernetes Pods | ~$10–$20/day | Costs vary based on cluster size, node configuration, and resource requirements. |
Containers on Virtual Machines | ~$15–$25/day | Includes VM hosting fees, container runtime costs, and storage costs for legacy systems. |
I compiled my experiments for one week, by using one of our own Azure tenants. Here’s the actual cost, per day, for a phi-3-mini-4k LLM deployment in Azure Container Instances, Container Apps, Web Apps, and Kubernetes Service. Next to it I highlighted the response time, in seconds.
Deployment Type | Cost Components | Estimated Cost Per Day (USD) | Response Time (sec) |
Container Instances | Compute Costs | $5.60 | 48.31 |
Container Apps | Base Pricing | $3.22 | 45 |
App Service Plans (Web Apps) | Premium v3 P1mv3 | $4.32 | 90 |
Premium v3 P3mv3 | $18.48 | 25 | |
Premium v3 P4mv3 | $35.52 | 15 | |
Premium v3 P5mv3 | $71.24 | 10 | |
Kubernetes Deployment - Extra Node | Compute Costs (Standard_D4as_v5, 4 vCPUs, 16GB) | $14.40 (approx.) | 42 |
Disk Costs | $2.16 | ||
Total (Compute + Disk) | $16.56 (approx.) | 42 | |
Compute Costs (Standard_D16as_v5, 16 vCPUs, 64GB) | $57.60 (approx.) | 43 | |
Disk Costs | $2.16 | ||
Total (Compute + Disk) | $59.76 (approx.) | 43 |
Findings:
Some customers avoid proprietary LLMs from OpenAI, Google, or Azure because their data would leave their premises (or their cloud). They ask a way to use open-source, on-prem LLMs.
After a month of testing, I’ve learned the contest isn’t equal:
The SAS Agentic AI Accelerator lets you deploy code-wrapped LLMs almost anywhere: Azure services, Kubernetes, or standalone VMs. Use the tables above to balance cost, performance, and operational effort.
Stay tuned for Part 3, where we’ll dig into deployment scripts, scoring calls, and security tips.
Thanks to Mike Goddard (@MichaelGoddard) for guidance on SAS Container Runtime Kubernetes deployments.
Agentic AI – How to with SAS Viya workshop now available on learn.sas.com to SAS Customers in the SAS Decisioning Learning Subscription and SAS Employees. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.
If you liked the post, give it a thumbs up! Please comment and tell us what you think about the AI Decisioning Assistant. For further guidance, reach out for assistance. Let us know how this solution works for you!
Find more articles from SAS Global Enablement and Learning here.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.