Welcome back to the SAS Agentic AI Accelerator series! If you’ve followed the journey so far—through registration, publishing, and container deployments—then you’re ready for the next stop: deploying your Large Language Models (LLMs) in Azure using Container Apps and Web Apps.
If deploying with Docker containers felt like making espresso, get ready for an espresso macchiato. A bit more involved (and maybe some extra foam). Let’s get brewing!
We’ll deploy the open-source Qwen-25-05b LLM (by Alibaba Cloud), included as a code-wrapper in the SAS Agentic AI Accelerator repo. “Qwen” is the short for 'Tongyi Qianwen' in Chinese, meaning “comprehensive understanding of a thousand questions.” The “05b” means 500 million parameters—think of them as brain cells.
Code-wrappers come with the SAS Agentic AI Accelerator code repository. Code-wrappers standardize LLM inputs and outputs, can be switched easily into agentic AI workflows and are easy to deploy, thanks to SCR.
First, make sure you have the Azure CLI (Command Line Interface) with the containerapp extension:
az extension add --name containerapp --upgrade
# Variables to set
CONT="my-qwen-app" # Name/DNS label for your app
RG="my-resource-group" # Azure Resource Group
ACR_NAME="myacr" # Azure Container Registry name
IMAGE="qwen_25_05b" # Image name
IMAGE_TAG="latest" # Image tag/version
ACR_PASS=$(az acr credential show -n $ACR_NAME --query "passwords[0].value" -o tsv)# ACR password or service principal
LOCATION="westus3" # Azure region – choose one that suits you
ENVIRONMENT="llms" # Azure Container App environment
You’re defining basic settings—like the names of your Azure Container Registry, resource group, container app, and image. You’re also grabbing your registry password securely.
az containerapp env create --name $ENVIRONMENT --resource-group $RG --location $LOCATION
This creates a secure environment in Azure where your container apps will live. Think of it as making a “home” for your deployments.
az containerapp create \
--name $CONT \
--resource-group $RG \
--cpu 4.0 \
--memory 8.0Gi \
--environment $ENVIRONMENT \
--registry-server ${ACR_NAME}.azurecr.io \
--registry-username $ACR_NAME \
--registry-password $ACR_PASS \
--image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \
--target-port 8080 \
--ingress external \
--query properties.configuration.ingress.fqdn
This command launches your LLM model as a container app in Azure. It connects to your Docker image (containing SAS Container Runtime code), gives it CPU and memory, and makes it accessible over the web (via HTTPS).
az containerapp list --resource-group $RG --output table
# Or to find your specific app
az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv
This is listing the deployed apps and retrieving the Fully Qualified Domain Name (FQDN) of your app, which we’ll use during scoring.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Azure Container Apps give you HTTPS endpoints out of the box, auto-scaling, and a slick way to isolate your resources. Think of this as your go-to for agile, event-driven workloads and secure external access. It’s a good compromise between Container Instances and Kubernetes Cluster.
Now, let’s wake up your deployed LLM:
FQDN=$(az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv); echo $FQDN
curl --location --request POST "https://${FQDN}/${IMAGE}" \
--header 'Content-Type: application/json' \
--header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \
--data-raw '{
"inputs": [
{"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"},
{"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with customer_name, loan_amount, customer_language. Follow the guidelines for a professional, friendly response."},
{"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"}
]
}' | jq
This curl command sends a request to your deployed LLM, providing text and instructions. The LLM reads your input, processes it, and returns a response (such as a drafted email).
Need to handle more requests? Azure Container Apps make it effortless:
az containerapp update \
--name $CONT \
--resource-group $RG \
--min-replicas 1 \
--max-replicas 10 \
--scale-rule-name http-scale \
--scale-rule-http-concurrency 1
You’re telling Azure to automatically add more app instances if traffic increases (auto-scaling). This helps your LLM handle multiple requests at once.
Azure Web Apps are a managed way to host your containerized LLMs. These shine when you need robust hosting for production APIs, HTTPS, deployment slots, and more “set-it-and-forget-it” operations.
# Create an App Service Plan
az appservice plan create \
--name my-llm-plan \
--resource-group $RG \
--location $LOCATION \
--sku P1mv3 \
--is-linux
# Create the Web App for Containers:
az webapp create \
--resource-group $RG \
--plan my-llm-plan \
--name my-qwen-app \
--container-image-name ${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG \
--container-registry-user $ACR_NAME \
--container-registry-password "$ACR_PASS" \
--https-only true
# Set the port
az webapp config appsettings set \
--name my-qwen-app \
--resource-group $RG \
--settings WEBSITES_PORT=8080
#Get your app’s endpoint
APP_FQDN=$(az webapp show --resource-group $RG --name my-qwen-app --query defaultHostName -o tsv)
The scoring call is nearly identical to the Azure Container Apps:
curl --location --request POST "https://${APP_FQDN}/${IMAGE}" \
--header 'Content-Type: application/json' \
--header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \
--data-raw '{
"inputs": [
{"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"},
{"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank."},
{"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"}
]
}' | jq
The response times, however, couldn’t be more different.
I did not find the Azure Web Apps to perform well for scoring open source LLMs. Azure Web Apps are great; they might be suitable for proprietary LLM code wrappers where the deployed image is light. With open source LLMs that download gigabytes of data to a container, they don't really shine. I prefer the Azure Container Apps for cost and response time, for now.
Don’t skip this: your data (and your Security Officer) will thank you.
az containerapp logs show --name $CONT --resource-group $RG
az webapp log download --name my-qwen-app --resource-group $RG
Azure Container Apps and Web Apps both give you flexible, secure options to deploy SAS Agentic AI Accelerator LLMs. Container Apps are ideal for experimentation and lighter workloads; Web Apps offer managed, stable hosting for API-heavy use cases, though faster responses come with a higher price tag.
Choose based on your needs, balance security and cost, and don’t forget to experiment and optimize as you go.
Thanks for following along! If you find this post helpful, give it a thumbs up, share your stories or questions in the comments, and let’s keep building better AI workflows together. Stay tuned for more!
SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure.
Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.
For further guidance, reach out for assistance.
Find more articles from SAS Global Enablement and Learning here.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.