SAS Agentic AI – Deploy and Score Models – Containers

2 Likes

Welcome back to the SAS Agentic AI Accelerator series! If you’ve made it this far in the series, you’ve survived the high-level overviews and cost comparisons—give yourself a pat on the back (or at least a fresh cup of coffee). Now, let’s roll up our sleeves and actually deploy something.;

Ever tried to deploy a Large Language Model (LLM) and felt like you were assembling IKEA furniture with missing instructions? Today, I’ll walk you through the nuts and bolts—so you can get your models running in Azure with minimal head-scratching.

Today, we’re going to take all that theory and put it into practice: deploying a code-wrapped LLM as a container in Azure. We’ll start simple (public IP), then get secure (private IP), and make sure you know what to watch out for at every step.

Where We Are In The Series

In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR).
In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure.
In Part 3.1, (that’s this post), let’s get hands-on: actual deployment and scoring scripts, with extra tips on security.

Example

As an example, let’s walk through deploying an open source model: the phi-3-mini-4k LLM code wrapper. Code-wrappers come with the SAS Agentic AI Accelerator code repository. Code-wrappers standardize LLM inputs and outputs, can be switched easily in agentic AI workflows and are easy to deploy, thanks to SCR.

This phi-3-mini-4k model, developed and released by Microsoft, is a lightweight large language model designed for efficiency and quick responses—think of it as a compact, agile AI that doesn’t need a supercomputer to run.

If you’re wondering about the quirky name “phi-3-mini-4k,” you’re not alone—it sounds like it could be R2-D2’s distant cousin from the Star Wars universe! There’s just something about AI and robotics that inspires these metallic, alphanumeric names. Maybe it’s a subtle nod to our sci-fi dreams, or perhaps it’s just because “Bob the Bot” doesn’t sound as futuristic or impressive.

Either way, let’s see how to get our own “phi” up and running in the cloud—no droids, no bots required!

Deploy to Azure Container Instances (Public IP)

In Azure, that’s the fastest way to test or demo your LLM. It’s not meant for production or anything sensitive.

Michael Goddard wrote about Deploying SAS Container Runtime models on Azure Container Instances. So far the code-wrapped LLM follow the same guidelines.

Deployment Script

For Azure deployment scripts, you can use the Azure Command Line Interface (CLI).

# Variables to set  
CONT="myprefix-phi"        # Name/DNS label for your container  
RG="my-resource-group"     # Azure Resource Group  
ACR_NAME="myacr"           # Azure Container Registry name  
IMAGE="phi_3_mini_4k"      # Image name  
IMAGE_TAG="latest"         # Image tag/version  
ACR_PASS=$(az acr credential show -n $ACR_NAME --query "passwords[0].value"  -o tsv)# ACR password or service principal  
LOCATION="westus3"         # Azure region – choose one that suits you 

az container create -n $CONT -g $RG \  
 --image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \  
 --registry-username $ACR_NAME \  
 --registry-password $ACR_PASS \  
 --ports 80 8080 \  
 --protocol TCP \  
 --dns-name-label $CONT \  
 --location $LOCATION \  
 --cpu 4 \  
 --memory 16

What’s Happening?

You’re spinning up a container with your LLM, exposing ports for API access (SCR needs 8080), and giving it enough juice (4 CPUs, 16 GB RAM) to keep things snappy. But it’s public! Anyone with the endpoint can poke your model.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Scoring Script

Time to see if your LLM is awake! Here’s a sample curl command to send a scoring request:

# Variables
curl -X POST "http://${CONT}.${LOCATION}.azurecontainer.io:8080/${IMAGE}" \  
    -H 'Content-Type: application/json' \  
    -d '{  
      "inputs": [  
        {"name":"userPrompt","value":"customer_name: X Y; loan_amount: 20000.0; customer_language: EN"},  
        {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with three pieces of information: customer_name, loan_amount, customer_language. Use the provided customer name and loan amount to personalize the email."}, 
       {"name":"options", "value":"{temperature:0.7,top_p:1,max_tokens:800}"}  
      ]  
    }' | jq

What’s Going On Here?

userPrompt: The data you want your LLM to use (customer name, loan amount, etc.).
systemPrompt: The “instructions manual” for your LLM. Think of it like setting your model’s GPS—so it doesn’t drive off into the weeds and start talking about quantum computing when you just wanted an email template.
options: Controls for creativity, length, and other LLM behaviors.

These three inputs are set by the LLM code wrapper from the SAS Agentic AI Accelerator.

Deploy with Private IP (VNET Integration)

Suitable for internal, secure deployments—perfect for real workflows where you care about data privacy.

Deployment Script

# New Variables
CONTP="myprefix-phi-private"  # Name/DNS label for your container  
VNET="SAS-Viya-azure-vnet" 	   # Virtual Network name
SUBNET="llm-subnet" 		   # Subnet name

# Step 1: Create a dedicated subnet for containers within your existing VNET
az network vnet subnet create \
    --resource-group $RG \
    --vnet-name $VNET \
    --name $SUBNET \
    --address-prefix 192.168.3.0/26 \ # adapt it to match your VNET ip range
    --delegations Microsoft.ContainerInstance/containerGroups

# Deploy the container with a private IP
az container create \  
    --resource-group $RG \  
    --name $CONTP \  
    --image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \  
    --registry-username $ACR_NAME \  
    --registry-password $ACR_PASS \  
    --ports 80 8080 \  
    --protocol TCP \  
    --location $LOCATION \  
    --vnet $VNET \  
    --subnet $SUBNET \  
    --ip-address private \  
    --cpu 4 \  
    --memory 16  

# Retrieve the Private IP  of your container
az container show --resource-group $RG --name $CONTP --query "ipAddress.ip" --output tsv

Make sure your SAS Viya and the deployed containers subnet are on the same VNET! Otherwise, your scoring requests will be like postcards sent to a house with no mailbox.

Scoring Script (Private IP Version)

The only thing changing in the scoring is the usage of a container IP instead of a DNS label (FQDN) for the public IP container:

CONT_IP=$(az container show --resource-group $RG --name $CONTP --query "ipAddress.ip" --output tsv)
echo "CONT_IP=${CONT_IP}" 
curl -X POST "http://${CONT_IP}:8080/${IMAGE}" ...

It’s just the same scoring request as before, just use the container’s private IP address instead of the container’s DNS from the public example.

Security Corner

Don’t send PII (that stands for Personally Identifiable Information) over public endpoints. Ever. (Seriously. Somebody might be listening to your traffic.)
No HTTPS by default: Traffic isn’t encrypted—even inside a VNET. For extra-sensitive data, put additional controls in place (private endpoints, network security groups, etc.).
Lock down your ACR credentials: Treat them like your Netflix password. Or better.

Summary

Public IP deployments are fast for testing, but risky for anything sensitive.
VNET-integrated (private) deployments are more acceptable for real-world use.
Security matters—enforce it from the start.
Troubleshooting is part of the journey. Don’t let the first error stop you.
Prompt engineering is your secret weapon for great LLM responses.

What Should You Do Next?

Test out both deployment methods—see which fits your needs.
Experiment with prompt engineering to get the best responses from your LLM.
Share your best (or worst!) LLM deployment stories in the comments below. If it’s an embarrassing (technical, please) story you can use the Anonymous option.
Read the next post in the series (coming soon).

Additional Resources

Want More Hands-On Guidance?

SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure.

Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.

If you liked the post, give it a thumbs up! Please comment and tell us what you think about the AI Decisioning Assistant. For further guidance, reach out for assistance. Let us know how this solution works for you!

Find more articles from SAS Global Enablement and Learning here.