SAS Agentic AI – Deploy and Score Models – Apps

2 Likes

Welcome back to the SAS Agentic AI Accelerator series! If you’ve followed the journey so far—through registration, publishing, and container deployments—then you’re ready for the next stop: deploying your Large Language Models (LLMs) in Azure using Container Apps and Web Apps.

If deploying with Docker containers felt like making espresso, get ready for an espresso macchiato. A bit more involved (and maybe some extra foam). Let’s get brewing!

Where We Are In The Series

In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR).
In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure.
In Part 3.1, SAS Agentic AI – Deploy and Score Models – Containers, we got our hands dirty deploying Azure Container Instances.
Part 3.2 (this post), we’ll discover Azure Container Apps and Web Apps for scalable, secure LLM deployments.

Example

We’ll deploy the open-source Qwen-25-05b LLM (by Alibaba Cloud), included as a code-wrapper in the SAS Agentic AI Accelerator repo. “Qwen” is the short for 'Tongyi Qianwen' in Chinese, meaning “comprehensive understanding of a thousand questions.” The “05b” means 500 million parameters—think of them as brain cells.

Code-wrappers come with the SAS Agentic AI Accelerator code repository. Code-wrappers standardize LLM inputs and outputs, can be switched easily into agentic AI workflows and are easy to deploy, thanks to SCR.

Deploying Azure Container Apps

Set Up Your Environment

First, make sure you have the Azure CLI (Command Line Interface) with the containerapp extension:

az extension add --name containerapp --upgrade

Define Your Variables

# Variables to set
CONT="my-qwen-app"         # Name/DNS label for your app  
RG="my-resource-group"     # Azure Resource Group  
ACR_NAME="myacr"           # Azure Container Registry name  
IMAGE="qwen_25_05b"        # Image name  
IMAGE_TAG="latest"         # Image tag/version  
ACR_PASS=$(az acr credential show -n $ACR_NAME --query "passwords[0].value"  -o tsv)# ACR password or service principal  
LOCATION="westus3"         # Azure region – choose one that suits you
ENVIRONMENT="llms"         # Azure Container App environment

You’re defining basic settings—like the names of your Azure Container Registry, resource group, container app, and image. You’re also grabbing your registry password securely.

Create the Container Apps Environment

az containerapp env create --name $ENVIRONMENT --resource-group $RG --location $LOCATION

This creates a secure environment in Azure where your container apps will live. Think of it as making a “home” for your deployments.

Deploy Your LLM as a Container App

az containerapp create \  
  --name $CONT \  
  --resource-group $RG \  
  --cpu 4.0 \  
  --memory 8.0Gi \  
  --environment $ENVIRONMENT \  
  --registry-server ${ACR_NAME}.azurecr.io \  
  --registry-username $ACR_NAME \  
  --registry-password $ACR_PASS \  
  --image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \  
  --target-port 8080 \  
  --ingress external \  
  --query properties.configuration.ingress.fqdn

This command launches your LLM model as a container app in Azure. It connects to your Docker image (containing SAS Container Runtime code), gives it CPU and memory, and makes it accessible over the web (via HTTPS).

Check If It’s Alive

az containerapp list --resource-group $RG --output table
# Or to find your specific app
az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv

This is listing the deployed apps and retrieving the Fully Qualified Domain Name (FQDN) of your app, which we’ll use during scoring.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Scoring the Container App

Azure Container Apps give you HTTPS endpoints out of the box, auto-scaling, and a slick way to isolate your resources. Think of this as your go-to for agile, event-driven workloads and secure external access. It’s a good compromise between Container Instances and Kubernetes Cluster.

Now, let’s wake up your deployed LLM:

FQDN=$(az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv); echo $FQDN
curl --location --request POST "https://${FQDN}/${IMAGE}" \  
  --header 'Content-Type: application/json' \  
  --header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \  
  --data-raw '{  
    "inputs": [  
      {"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"},  
      {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with customer_name, loan_amount, customer_language. Follow the guidelines for a professional, friendly response."},  
      {"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"}  
    ]  
  }' | jq

This curl command sends a request to your deployed LLM, providing text and instructions. The LLM reads your input, processes it, and returns a response (such as a drafted email).

Scaling Up: The Easy Way

Need to handle more requests? Azure Container Apps make it effortless:

az containerapp update \  
  --name $CONT \  
  --resource-group $RG \  
  --min-replicas 1 \  
  --max-replicas 10 \  
  --scale-rule-name http-scale \  
  --scale-rule-http-concurrency 1

You’re telling Azure to automatically add more app instances if traffic increases (auto-scaling). This helps your LLM handle multiple requests at once.

Deploying to Azure Web Apps

Azure Web Apps are a managed way to host your containerized LLMs. These shine when you need robust hosting for production APIs, HTTPS, deployment slots, and more “set-it-and-forget-it” operations.

Set Up Your Web App

First, create an App Service Plan (this controls the steam pressure of your espresso machine).
Second, create the Web App for Containers.
Third, set the SCR scoring port: 8080 or 8443 for TLS.
Lastly, get your app’s endpoint to start scoring.

# Create an App Service Plan
az appservice plan create \  
  --name my-llm-plan \  
  --resource-group $RG \  
  --location $LOCATION \  
  --sku P1mv3 \  
  --is-linux  
 
# Create the Web App for Containers:
az webapp create \  
  --resource-group $RG \  
  --plan my-llm-plan \  
  --name my-qwen-app \  
  --container-image-name ${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG \  
  --container-registry-user $ACR_NAME \  
  --container-registry-password "$ACR_PASS" \  
  --https-only true  
 
# Set the port
az webapp config appsettings set \  
  --name my-qwen-app \  
  --resource-group $RG \  
  --settings WEBSITES_PORT=8080  
 
#Get your app’s endpoint
APP_FQDN=$(az webapp show --resource-group $RG --name my-qwen-app --query defaultHostName -o tsv)

Scoring the Web App

The scoring call is nearly identical to the Azure Container Apps:

curl --location --request POST "https://${APP_FQDN}/${IMAGE}" \ 
  --header 'Content-Type: application/json' \ 
  --header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \  
  --data-raw '{  
    "inputs": [  
      {"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"},  
      {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank."},  
      {"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"}  
    ]  
  }' | jq

The response times, however, couldn’t be more different.

Performance and Scaling Notes

I did not find the Azure Web Apps to perform well for scoring open source LLMs. Azure Web Apps are great; they might be suitable for proprietary LLM code wrappers where the deployed image is light. With open source LLMs that download gigabytes of data to a container, they don't really shine. I prefer the Azure Container Apps for cost and response time, for now.

Container Apps: Fast deployment, auto-scaling, and HTTPS. Great for lightweight/medium models and variable workloads.
Web Apps: Managed, reliable API hosting—good for high-availability, but expect higher costs and slower responses for large LLM images unless you scale up resources. Example:
- 2 CPU, 16 GB: ~90s/response
- 8 CPU, 64 GB: ~25s
- 16 CPU: ~15s
- 32 CPU: ~10s

Security Corner

Don’t skip this: your data (and your Security Officer) will thank you.

Always use HTTPS for requests.
Protect sensitive data and ACR credentials.
For sensitive/internal workloads: use VNET, private endpoints, or internal-only ingress.

Regularly check logs:

Container Apps:

az containerapp logs show --name $CONT --resource-group $RG

Web Apps:

az webapp log download --name my-qwen-app --resource-group $RG

Recommendations

For quick tests and demos, use Container Apps with public HTTPS—fast, simple, and safe for non-sensitive data.
For internal production needs, use VNET (Virtual Network) integration or internal ingress, and plan scaling based on your LLM’s requirements.
For robust, high-availability APIs, Web Apps deliver managed hosting and scaling but be prepared to adjust resources and costs.
Monitor and tune performance: More CPUs and RAM usually help, but test and adjust for best results. Container Apps have a 2 CPU, 8GB RAM cap.

Summary

Azure Container Apps and Web Apps both give you flexible, secure options to deploy SAS Agentic AI Accelerator LLMs. Container Apps are ideal for experimentation and lighter workloads; Web Apps offer managed, stable hosting for API-heavy use cases, though faster responses come with a higher price tag.

Choose based on your needs, balance security and cost, and don’t forget to experiment and optimize as you go.

Thanks for following along! If you find this post helpful, give it a thumbs up, share your stories or questions in the comments, and let’s keep building better AI workflows together. Stay tuned for more!

Additional Resources

Want More Hands-On Guidance?

SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure.

Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.

For further guidance, reach out for assistance.

Find more articles from SAS Global Enablement and Learning here.