LLMOps with SAS Viya and Azure DevOps: A Parameter Change, Not a Project

1 Like

Last week, a colleague asked me how long it takes to deploy a new LLM wrapper into our SAS Viya environment. I said about ten minutes. He laughed, because three months ago the honest answer was "half a day, if the right person is available."

That gap, between ten minutes and half a day, is the story I want to tell.

The Problem Nobody Talks About

There's plenty written about what agentic AI can do. The demos are impressive. An agent inside SAS Intelligent Decisioning calls multiple Large Language Models (LLMs), orchestrates their responses, and delivers a result that no single model could produce alone. Think of it as a team lead who delegates tasks to “specialists” and assembles the final answer.

But here's the part that rarely makes it into the keynote: somebody has to hire and onboard those specialists first.

And in this world, "onboarding" doesn't mean clicking a button. It means setting up a Python environment. Cloning a repository. Running registration scripts against SAS Model Manager. Publishing container images, which, for open-source models like Qwen or Phi, involves downloading gigabytes of model weights and packaging them into something a server can actually run. Then configuring Azure networking, spinning up containers, waiting for private IPs, and finally testing whether everything actually responds.

Every one of those steps is manual. Every one depends on someone who knows well the SAS Agentic AI Accelerator, the CLI flags, the environment variables, the order of operations, Azure CLI and administration, SAS Viya on Azure architecture. If that person is busy, or on vacation, or simply working on something else, nobody gets their models deployed.

That's the bottleneck. Not the AI. It’s the “plumbing”.

What If It Were Just a Parameter List?

The SAS Agentic AI Accelerator, now a public GitHub repository, already provides the building blocks. It includes Python-based LLM wrappers that standardize how different models are called by a SAS agentic workflow. Think of wrappers as universal adapters: no matter whether the model behind them is GPT-4o mini from Azure OpenAI or an open-source Qwen running locally, the agent talks to all of them the same way. The Accelerator also ships scripts for registration, publishing, deployment and so much more. The pieces exist.

What was missing was the glue. A single, repeatable process that anyone on the team could trigger without understanding every moving part underneath.

That’s how we built an Azure DevOps pipeline.

The idea is simple. You open the pipeline, you see a YAML parameter list, a short, readable configuration block that describes what you want. Each entry is a model: a name, CPU and memory allocation, a test prompt, and model-specific options. You add a model, remove a model, adjust resources. Then you click Run.

Watch the full pipeline run — from parameter list to working endpoints — in under 5 minutes:

(view in My Videos)

That's the entire interface. It's like ordering from a library catalog: pick what you want, specify the format, click submit. No need to know how the library works.

yml:

parameters:
  - name: models
    type: object
    default:
      - name: gpt_4o_mini_az_2024_07_18
        cpu: 2
        memory: 4
        userPrompt: 'customer_name: David W.; ...'
        systemPrompt: 'You are tasked with drafting...'
        options: '{temperature:1,top_p:1,max_tokens:800,API_KEY:$(AZURE_OAI_KEY)}'
      - name: qwen_25_05b
        cpu: 4
        memory: 16
        ...
      - name: phi_3_mini_4k
        cpu: 4
        memory: 16
        ...

The pipeline also pulls in a variable group (sas-viya-credentials) and a secure certificate file, both stored in Azure DevOps Library. Think of these as the master key ring: SAS Viya URL, credentials, Azure OpenAI keys, and the trusted certificate for secure connections. You set them up once, and every pipeline run grabs them automatically.

Four Stages, One Run

The pipeline has four stages: Register, Publish, Deploy, and Test. Let me walk through each one.

Register

The pipeline starts by setting up its workspace: installing Python packages, downloading the SAS certificate from Azure DevOps Secure Files, and cloning the Accelerator repository. It then calls two Accelerator scripts. The first, Model-Manager-Setup.py, creates the LLM Model Project in SAS Model Manager if it doesn't already exist. It’s like creating a folder before filing documents. The second, register-LLMs.py, registers each wrapper from the parameter list.

After this stage, the models exist in SAS Model Manager. They're versioned and tracked. But they're not yet runnable. If deploying a model is like opening a restaurant, this stage just filed the business license.

Publish

Now the pipeline builds the actual containers, the runnable packages that will serve each model. But before it builds anything, it checks Azure Container Registry (ACR), which is essentially a warehouse for container images. For each model, it asks: is there already a package on the shelf? If yes, it moves on. No rebuild, no wasted time.

For models that do need publishing, the pipeline calls the Accelerator's publish-LLMs.py script, which triggers SAS Viya's container publishing mechanism. The images are built and pushed to ACR. For proprietary wrappers that call external APIs, like GPT-4o mini reaching out to Azure OpenAI, publishing is fast because the container is lightweight. For open-source models that carry their own weights baked in, imagine downloading a small library's worth of books and binding them into a single volume. Twenty to thirty minutes is normal.

The pipeline then polls ACR, up to forty checks, thirty seconds apart, until every required image is confirmed present. Patient, but not infinitely so.

This skip-if-exists behavior is what makes the pipeline safe to rerun. Changed one model? The pipeline publishes only that one. Everything else passes through untouched. It's like a smart printer that only reprints the pages you edited.

Deploy

With images in ACR, the pipeline moves to infrastructure. It creates or reuses a delegated subnet, which is basically a reserved lane inside the virtual network (VNET). It then deploys one Azure Container Instance (ACI) per model. ACI is Azure's simplest container hosting option: you give it an image, some CPU and memory, and it runs. No cluster to manage, no orchestration layer to configure. Each container gets a private IP, meaning it's reachable only from inside the network where SAS Viya lives. No public exposure.

There's another skip here: if a container from a previous run is already healthy with a valid IP, the pipeline leaves it alone. It only replaces what's broken or missing.

Once all containers are running, the pipeline assembles a simple json file that maps each model to its private address:

[
  {
    "name": "qwen_25_05b",
    "container": "prefix-qwen-25-05b",
    "ip": "10.0.3.12",
    "endpoint": "http://10.0.3.12:8080/qwen_25_05b"
  }
]

That file becomes a pipeline artifact, a deliverable attached to the pipeline run that downstream processes can pick up. A second pipeline that deploys the SAS agent, or a person configuring one manually, grabs this file and plugs the endpoints right in. No copying IPs from terminal output. No Teams messages to the Azure administrator asking "what's the endpoint for Qwen again?"

Test

The final stage is optional, controlled by a RUN_TESTS variable. When enabled, it downloads the endpoints artifact and sends a real scoring request to each model using the prompts defined in the pipeline parameters. Each request hits the private endpoint with a system prompt, a user prompt, and model options (SAS Agentic AI Accelerator style) exactly the way the SAS agent would call it in production.

If every model responds, the full path, from registration through deployment through scoring is confirmed in a single run. No SSH-ing into a jump box. No manual curl commands. The pipeline tells you whether it worked, or exactly where it didn't.

What This Changes in Practice

Before this pipeline, deploying three LLM wrappers was a morning's work for someone experienced and a multi-day adventure for someone learning the ropes. The dependency wasn't on technology, the Accelerator scripts worked fine, it was on coordination. You needed the right person, the right environment, and uninterrupted time to run a chain of steps in the right order. It was like needing a specific mechanic at a specific garage on a specific day just to change your tires.

Now a data scientist who wants to try a different model edits a YAML list and runs a pipeline. Need Phi instead of Qwen? Swap the entry. Need another one? Add a line. The pipeline figures out what's already deployed, skips the redundant work, and delivers private endpoints ready for use.

The separation into two pipelines matters here too. Pipeline 1, this one, handles LLM infrastructure. Pipeline 2, which we will discuss later, handles the SAS agent itself. They have different lifecycles. You might swap or add LLMs every day or every hour but change your agent logic once a month. Keeping them separate means you debug them separately, test them separately, and evolve them at their own pace. It's the same reason you don't rebuild your entire house just because you want to repaint the kitchen.

A Note on Infrastructure

The Azure DevOps pipeline runs on a self-hosted agent, in our case, the SAS Viya deployment’s Jump VM. And yes, there is some serious configuration behind the scenes. This is a preference. SAS Viya sits behind a private network and load balancer. Microsoft-hosted agents, the default runners Azure DevOps provides, simply cannot reach it (without more configuration). Using a self-hosted agent inside the same network is like having a delivery driver who already has a badge to get past the security gate.

For deployment targets, we chose Azure Container Instances (ACI) deliberately. ACI is the "just run my container" option: fast to set up, no cluster to manage, costs only while it's running. For a Minimum Viable Product, a proof of concept, or a workshop, it's exactly the right tool.

As requirements grow, auto-scaling, rolling updates, production-grade resilience, the same pipeline logic can move to Azure Kubernetes Service (AKS). The pipeline design anticipates that evolution without forcing it prematurely.

Step-by-Step

Start with ACI. Graduate to AKS when you're ready.

The SAS Agentic AI Accelerator is public on GitHub. The pipeline YAML is available below for the ACI approach (and soon for the AKS approach).

If you're running SAS Viya on Azure and you want repeatable, governed LLM deployment, start with one model. Get the pipeline running. Add a second model. Watch it skip the first and publish only what's new.

Then hand it to your team and get out of the way.

Wrapping Up

We took a manual, sequential, specialist-dependent deployment process and turned it into a four-stage Azure DevOps pipeline. Register, Publish, Deploy, Test. It skips work that's already done. It reuses healthy containers. It produces a clean artifact that the next pipeline picks up without asking questions.

The most important outcome isn't technical. It's organizational. The people who need LLMs deployed no longer wait in line behind the people who know how to deploy them. The knowledge is in the pipeline. The decisions, which models, what prompts to test, stay with the domain experts where they belong.

Deploying a new LLM is a parameter change, not a project.

That's LLMOps with SAS Viya and Azure DevOps.

LLM Wrapper Azure DevOps Deployment Pipeline for SAS Viya

The SAS Agentic AI Accelerator GitHub repository includes an end-to-end Azure DevOps pipeline for deploying SAS Viya LLM wrappers: llm-deployment-aci.yml.

Prerequisites and setup details are provided in the accompanying README.md.

Additional Resources

SAS Agentic AI Accelerator (SAS Agentic AI Accelerator – GitHub public repository).
SAS Agentic AI Accelerator – Register and Publish Models.
SAS Agentic AI – Deploy and Score Models – The Big Picture.
SAS Agentic AI – Deploy and Score Models – Containers.
SAS Agentic AI – Deploy and Score Models – Apps.
SAS Agentic AI – Build Workflows in SAS Intelligent Decisioning.
SAS Container Runtime – SAS Documentation.

Want More Hands-On Guidance?

SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure.

Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows.

Find more articles from SAS Global Enablement and Learning here.