BookmarkSubscribeRSS Feed

SAS Agentic AI – Deploy and Score Models – The Big Picture

Started 4 weeks ago by
Modified 3 weeks ago by
Views 559

Welcome back to the SAS Agentic AI Accelerator series! Today we’ll explore how to deploy and score code-wrapped Large Language Models (LLMs) in Azure, then call them from Agentic AI workflows inside SAS Viya.

 

To keep things clear, the topic is split into two parts:

  1. The Big Picture – a high-level overview with a short video and comparison tables that help you choose a deployment method. Azure is our example cloud.
  2. The Nitty-Gritty – a hands-on guide with deployment and scoring scripts.

 

 

Where We Are In The Series

 

In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how to publish them with the SAS Container Runtime (SCR). The end result was Docker images in a container registry.

Part 2 — this post — covers the deployment options.

 

 

Deployment and Scoring Overview

 

After registering and publishing an LLM code wrapper, you can deploy it as a Docker image in various environments. Once deployed, you can score using the SAS Container Runtime API.

 

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

Deployment Options

 

Here’s a quick overview of the deployment options in the Azure cloud. That’s not an exhaustive list. This list reflects only what I tested:

 

Deployment Option Use Case Scalability Ease of Setup Security

Azure Container Instances

Lightweight, quick starts Low Simple Public or private IP (HTTP only)
Azure Container Apps Event-driven, auto-scaling Medium Managed Public IP (HTTPS)
Azure Web Apps Managed containers hosting Medium Managed Public IP (HTTPS)
Kubernetes Pods Large-scale, fully orchestrated High Complex (requires YAML, must manage node resources) Flexible: Private / public IP (HTTP or HTTPS)
Containers on Virtual Machines Legacy or custom configurations Medium Moderate Flexible (private/public)

 

 

Key Considerations for Each Deployment Option

 

As you can see, there are so many options available in the Azure cloud. The following table should help you choose the one for you needs.

 

Deployment Option Feasibility Advantages Limitations
Azure Container Instances Ideal for small open source LLMs like phi-3-mini-4k (4 vCPUs, 16 GB RAM). Easiest to launch. Limited to 4 CPUs, 16 GB RAM.
Azure Container Apps Middle ground between Container Instances and Kubernetes clusters. Built-in HTTPS ingress and auto-scale. Capped at 2 CPUs and 8 GB RAM; may cause out-of-memory errors.
Azure Web Apps Simple to deploy and scale. Works for lighter workloads. Supports deployment slots. Resource limits can bottleneck performance. Adding resources doesn’t always improve performance.
Kubernetes Pods Production-grade control over resources, scaling, and isolation. Fine-grained control over resources, scaling, and isolation. Requires Kubernetes skills (that’s what customers are always telling us).
Containers on Virtual Machines Highly flexible for legacy systems or custom configurations. Complete control over CPU, RAM, and disk. Higher cost and operational effort.

 

 

Pricing Comparison

 

Any deployment choice, in a cloud, has a cost.

 

To help you evaluate the cost of each deployment option, the table below summarizes typical daily costs based on Azure pricing estimates. These values may vary depending on region, container size, and configuration. The estimates assume low, infrequent traffic, a few requests per day from your Agentic AI workflow to your deployed LLM.

 

Deployment Option Estimated Daily Cost Details

Azure Container

Instances

~$3–$10/day Cost depends on CPU and memory allocation (e.g., 2 CPUs and 8 GB of memory).
Azure Container Apps ~$5–$12/day Includes management costs, ingress, and scalability features.
Azure Web Apps ~$8–$15/day Managed service costs include app hosting and container runtime fees.
Kubernetes Pods ~$10–$20/day Costs vary based on cluster size, node configuration, and resource requirements.
Containers on Virtual Machines ~$15–$25/day Includes VM hosting fees, container runtime costs, and storage costs for legacy systems.

 

 

Observed Price Comparison

 

I compiled my experiments for one week, by using one of our own Azure tenants. Here’s the actual cost, per day, for a phi-3-mini-4k LLM deployment in Azure Container Instances, Container Apps, Web Apps, and Kubernetes Service. Next to it I highlighted the response time, in seconds.

 

Deployment Type Cost Components Estimated Cost Per Day (USD) Response Time (sec)
Container Instances Compute Costs $5.60 48.31
Container Apps Base Pricing $3.22 45
App Service Plans (Web Apps) Premium v3 P1mv3 $4.32 90
  Premium v3 P3mv3 $18.48 25
  Premium v3 P4mv3 $35.52 15
  Premium v3 P5mv3 $71.24 10
Kubernetes Deployment - Extra Node Compute Costs (Standard_D4as_v5, 4 vCPUs, 16GB) $14.40 (approx.) 42
  Disk Costs $2.16  
  Total (Compute + Disk) $16.56 (approx.) 42
  Compute Costs (Standard_D16as_v5, 16 vCPUs, 64GB) $57.60 (approx.) 43
  Disk Costs $2.16  
  Total (Compute + Disk) $59.76 (approx.) 43

 

Findings:

 

  1. Container Instances: Listed first due to its simplicity and lower cost for isolated deployments:
    1. To deploy the small open-source LLMs, qwen-25-05-b and phi-3-mini-4k we found that 4 CPUs and 16 GB of memory are more appropriate. And these are small models.
    2. To deploy a larger open-source LLM, such as a LLaMA 2-7B (Large Language Model Meta AI), you may require 16–32 GB and 4–8 vCPUs and lots of disk space, 20 – 50 GB. You could deploy it on  4 vCPUs and 16 GB RAM but that means your model may be quite constrained and the response time will be quite high, if you get a response at all.Container Instances: Listed first due to its simplicity and lower cost for isolated deployments:
  2. Container Apps: Second, as it offers scalable, event-driven microservices at a competitive cost. The scalability feature is interesting, allowing the app to spin more pods for concurrent scoring requests. Built-in ingress (HTTPS endpoints) and scalability make this an excellent choice for lightweight open source model deployments.
  3. Web Apps: Listed next, reflecting managed hosting options with varying performance tiers. Choose configurations that allow at least 4 vCPUs and 16 GB RAM. We tried scaling up the App Service Plan gradually. As you can observe, throwing more resources at a model, doesn’t proportionally reduce the response time. There’s a fine balance between cost and performance. You can only find it by experimenting.
  4. Kubernetes Deployment: Last, as it is best suited for complex workflows requiring high scalability and orchestration. For the phi-3-mini-4k LLM we added a dedicated node and deployed the container in a pod. Scaling the node size didn’t seem to influence the response time. Perhaps other parameters such as disk type an IOPS should be fine-tuned. More work is needed.

 

Discussion

 

Some customers avoid proprietary LLMs from OpenAI, Google, or Azure because their data would leave their premises (or their cloud). They ask a way to use open-source, on-prem LLMs.

 

After a month of testing, I’ve learned the contest isn’t equal:

 

  • Proprietary cloud LLMs almost always win on cost, latency, and accuracy.
  • Self-hosting shifts all compute costs to you, so each request costs more and takes longer.
  • High latency limits daily throughput, pushing the per-request price even higher.
  • That premium is the trade-off for keeping data inside your own walls.

 

 

Summary

 

The SAS Agentic AI Accelerator lets you deploy code-wrapped LLMs almost anywhere: Azure services, Kubernetes, or standalone VMs. Use the tables above to balance cost, performance, and operational effort.

 

Stay tuned for Part 3, where we’ll dig into deployment scripts, scoring calls, and security tips.

 

 

Acknowledgements

 

Thanks to Mike Goddard (@MichaelGoddard) for guidance on SAS Container Runtime Kubernetes deployments.

 

 

Additional Resources

 

 

Workshop Environment

 

Agentic AI – How to with SAS Viya workshop now available on learn.sas.com to SAS Customers in the SAS Decisioning Learning Subscription and SAS Employees. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.

01_BT_AgenticAI_Workshop-1536x744.png

 

 

If you liked the post, give it a thumbs up! Please comment and tell us what you think about the AI Decisioning Assistant. For further guidance, reach out for assistance. Let us know how this solution works for you!

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
3 weeks ago
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started