BookmarkSubscribeRSS Feed

SAS Agentic AI – Deploy and Score Models – Apps

Started ‎07-18-2025 by
Modified ‎07-18-2025 by
Views 575

Welcome back to the SAS Agentic AI Accelerator series! If you’ve followed the journey so far—through registration, publishing, and container deployments—then you’re ready for the next stop: deploying your Large Language Models (LLMs) in Azure using Container Apps and Web Apps.

 

If deploying with Docker containers felt like making espresso, get ready for an espresso macchiato. A bit more involved (and maybe some extra foam). Let’s get brewing!

 

 

Where We Are In The Series

 

 

 

Example

 

We’ll deploy the open-source Qwen-25-05b LLM (by Alibaba Cloud), included as a code-wrapper in the SAS Agentic AI Accelerator repo. “Qwen” is the short for 'Tongyi Qianwen' in Chinese,  meaning “comprehensive understanding of a thousand questions.” The “05b” means 500 million parameters—think of them as brain cells.

 

Code-wrappers come with the SAS Agentic AI Accelerator code repository. Code-wrappers standardize LLM inputs and outputs, can be switched easily into agentic AI workflows and are easy to deploy, thanks to SCR.

 

 

Deploying Azure Container Apps

 

Set Up Your Environment

 

First, make sure you have the Azure CLI (Command Line Interface) with the containerapp extension:

 

az extension add --name containerapp --upgrade

 

 

Define Your Variables

 

# Variables to set
CONT="my-qwen-app"         # Name/DNS label for your app  
RG="my-resource-group"     # Azure Resource Group  
ACR_NAME="myacr"           # Azure Container Registry name  
IMAGE="qwen_25_05b"        # Image name  
IMAGE_TAG="latest"         # Image tag/version  
ACR_PASS=$(az acr credential show -n $ACR_NAME --query "passwords[0].value"  -o tsv)# ACR password or service principal  
LOCATION="westus3"         # Azure region – choose one that suits you
ENVIRONMENT="llms"         # Azure Container App environment

 

You’re defining basic settings—like the names of your Azure Container Registry, resource group, container app, and image. You’re also grabbing your registry password securely.

 

 

Create the Container Apps Environment

 

az containerapp env create --name $ENVIRONMENT --resource-group $RG --location $LOCATION

 

This creates a secure environment in Azure where your container apps will live. Think of it as making a “home” for your deployments.

 

 

Deploy Your LLM as a Container App

 

az containerapp create \  
  --name $CONT \  
  --resource-group $RG \  
  --cpu 4.0 \  
  --memory 8.0Gi \  
  --environment $ENVIRONMENT \  
  --registry-server ${ACR_NAME}.azurecr.io \  
  --registry-username $ACR_NAME \  
  --registry-password $ACR_PASS \  
  --image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \  
  --target-port 8080 \  
  --ingress external \  
  --query properties.configuration.ingress.fqdn

 

This command launches your LLM model as a container app in Azure. It connects to your Docker image (containing SAS Container Runtime code), gives it CPU and memory, and makes it accessible over the web (via HTTPS).

 

 

Check If It’s Alive

 

az containerapp list --resource-group $RG --output table
# Or to find your specific app
az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv

 

This is listing the deployed apps and retrieving the Fully Qualified Domain Name (FQDN) of your app, which we’ll use during scoring.

 

01_BT_LLM_AzureContainerApp_Deployment-2048x824.png

 

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

Scoring the Container App

 

Azure Container Apps give you HTTPS endpoints out of the box, auto-scaling, and a slick way to isolate your resources. Think of this as your go-to for agile, event-driven workloads and secure external access. It’s a good compromise between Container Instances and Kubernetes Cluster.

 

Now, let’s wake up your deployed LLM:

 

FQDN=$(az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv); echo $FQDN
curl --location --request POST "https://${FQDN}/${IMAGE}" \  
  --header 'Content-Type: application/json' \  
  --header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \  
  --data-raw '{  
    "inputs": [  
      {"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"},  
      {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with customer_name, loan_amount, customer_language. Follow the guidelines for a professional, friendly response."},  
      {"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"}  
    ]  
  }' | jq

 

This curl command sends a request to your deployed LLM, providing text and instructions. The LLM reads your input, processes it, and returns a response (such as a drafted email).

 

02_BT_LLM_AzureContainerApp_Scoring.png

 

 

Scaling Up: The Easy Way

 

Need to handle more requests? Azure Container Apps make it effortless:

 

az containerapp update \  
  --name $CONT \  
  --resource-group $RG \  
  --min-replicas 1 \  
  --max-replicas 10 \  
  --scale-rule-name http-scale \  
  --scale-rule-http-concurrency 1  

 

You’re telling Azure to automatically add more app instances if traffic increases (auto-scaling). This helps your LLM handle multiple requests at once.

 

03_BT_LLM_AzureContainerApp_Scaling-1024x528.png

 

 

Deploying to Azure Web Apps

 

Azure Web Apps are a managed way to host your containerized LLMs. These shine when you need robust hosting for production APIs, HTTPS, deployment slots, and more “set-it-and-forget-it” operations.

 

 

Set Up Your Web App

 

  • First, create an App Service Plan (this controls the steam pressure of your espresso machine).
  • Second, create the Web App for Containers.
  • Third, set the SCR scoring port: 8080 or 8443 for TLS.
  • Lastly, get your app’s endpoint to start scoring.

 

# Create an App Service Plan
az appservice plan create \  
  --name my-llm-plan \  
  --resource-group $RG \  
  --location $LOCATION \  
  --sku P1mv3 \  
  --is-linux  
 
# Create the Web App for Containers:
az webapp create \  
  --resource-group $RG \  
  --plan my-llm-plan \  
  --name my-qwen-app \  
  --container-image-name ${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG \  
  --container-registry-user $ACR_NAME \  
  --container-registry-password "$ACR_PASS" \  
  --https-only true  
 
# Set the port
az webapp config appsettings set \  
  --name my-qwen-app \  
  --resource-group $RG \  
  --settings WEBSITES_PORT=8080  
 
#Get your app’s endpoint
APP_FQDN=$(az webapp show --resource-group $RG --name my-qwen-app --query defaultHostName -o tsv)

 

 

Scoring the Web App

 

The scoring call is nearly identical to the Azure Container Apps:

 

curl --location --request POST "https://${APP_FQDN}/${IMAGE}" \ 
  --header 'Content-Type: application/json' \ 
  --header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \  
  --data-raw '{  
    "inputs": [  
      {"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"},  
      {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank."},  
      {"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"}  
    ]  
  }' | jq

 

The response times, however, couldn’t be more different.

 

 

Performance and Scaling Notes

 

I did not find the Azure Web Apps to perform well for scoring open source LLMs. Azure Web Apps are great; they might be suitable for proprietary LLM code wrappers where the deployed image is light. With open source LLMs that download gigabytes of data to a container, they don't really shine. I prefer the Azure Container Apps for cost and response time, for now.

 

  • Container Apps: Fast deployment, auto-scaling, and HTTPS. Great for lightweight/medium models and variable workloads.
  • Web Apps: Managed, reliable API hosting—good for high-availability, but expect higher costs and slower responses for large LLM images unless you scale up resources. Example:
    • 2 CPU, 16 GB: ~90s/response
    • 8 CPU, 64 GB: ~25s
    • 16 CPU: ~15s
    • 32 CPU: ~10s

 

 

Security Corner

 

Don’t skip this: your data (and your Security Officer) will thank you.

 

  • Always use HTTPS for requests.
  • Protect sensitive data and ACR credentials.
  • For sensitive/internal workloads: use VNET, private endpoints, or internal-only ingress.
  • Regularly check logs:
    • Container Apps:
      az containerapp logs show --name $CONT --resource-group $RG

 

    • Web Apps:
      az webapp log download --name my-qwen-app --resource-group $RG

 

Recommendations

 

  • For quick tests and demos, use Container Apps with public HTTPS—fast, simple, and safe for non-sensitive data.
  • For internal production needs, use VNET (Virtual Network) integration or internal ingress, and plan scaling based on your LLM’s requirements.
  • For robust, high-availability APIs, Web Apps deliver managed hosting and scaling but be prepared to adjust resources and costs.
  • Monitor and tune performance: More CPUs and RAM usually help, but test and adjust for best results. Container Apps have a 2 CPU, 8GB RAM cap.

 

 

Summary

 

Azure Container Apps and Web Apps both give you flexible, secure options to deploy SAS Agentic AI Accelerator LLMs. Container Apps are ideal for experimentation and lighter workloads; Web Apps offer managed, stable hosting for API-heavy use cases, though faster responses come with a higher price tag.

 

Choose based on your needs, balance security and cost, and don’t forget to experiment and optimize as you go.

 

Thanks for following along! If you find this post helpful, give it a thumbs up, share your stories or questions in the comments, and let’s keep building better AI workflows together. Stay tuned for more!

 

 

Additional Resources

 

 

 

Want More Hands-On Guidance?

 

SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure.

 

Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.

 

04_BT_AgenticAI_Workshop-1024x496.png

 

For further guidance, reach out for assistance.

 

 

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
‎07-18-2025 06:05 AM
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started