About Bogdan_Teleuca

Bogdan_Teleuca

@AndreasMenrath You're welcome. Not to mention you will have costs for your LLM... 😁 The point of the post is the prompt and the prompting technique. And the post definitely not intended for users who want to click. It's intended for automatic processes that can use code to replicate a SAS Studio flow. Obviously, I failed to explain that. I should do a better job.

Bogdan_Teleuca

SAS Studio flows are a staple for data engineers working with the SAS Viya platform. Enter generative AI, specifically GPT-5 from Azure OpenAI: can it create entire SAS Studio flows (.flw files) from prompts? And just how well does it understand the structure, logic, and metadata behind your flows? In this post, I’ll walk you through a pair of hands-on demos that probe the limits of GPT-5’s code generation for SAS Studio flows. You’ll see where it shines, where it stumbles, and what this means for the future of SAS Studio flows generation. The Challenge: Generating a Complete SAS Studio Flow Let’s start with a fundamental question: Can generative AI build an entire SAS Studio flow, not just snippets or code fragments? Over the past seven months, I experimented with generating SAS Studio custom steps using the GPT-4 model family. This worked well, as long as you included a few examples in the prompt. I tried using the same model family to generate an entire SAS Studio flow with a few steps. The output was often truncated or missing key elements. Bottom line: I never got a SAS Studio flow that could be generated and run without errors. A few months later, GPT-5 was released. I decided to try again, this time generating entire flows. SAS Studio Flows Anatomy SAS Studio flows are stored as .flw files, native file format for SAS Studio flows. Technically, it’s a JSON (JavaScript Object Notation) document, specifically designed to describe the entire structure of a SAS Studio flow in a way that both humans and software can read and understand. But this isn’t “flat” JSON. It’s highly nested. Think of it like a set of Russian dolls: at the top, you have the outer structure describing the overall flow, and inside you’ll find arrays and objects representing nodes, connections, properties, parameters, and even sub-flows. Each node (such as a data source, query, filter, join, or output) is represented as its own JSON object, sometimes with further nested children if, say, it’s a container for steps. These are JSON documents that can easily stretch to thousands of lines. Defining each node, connection, parameter, and property by hand is a labor of love or, depending on your patience, a labor of frustration. The Approach To put GPT-5 to the test, I built a custom Python program that sends a prompt to Azure OpenAI’s GPT-5 and writes the response to a new .flw file. The prompt is made of: The user message, which uses a one-shot, custom prompt engineering technique: The one-shot component uses a complete .flw file, as an example. The custom component includes clear input and output requirements and the relevant source table column metadata. Precise, tailored instructions are an extremely effective method for structured code/flow generation with large language models. The system message, which is minimal and describes only the desired output rules. Demo 1: Simple Flow Generation We started with a simple ask: Generate a SAS Studio Flow that selects a few columns, applies a filter, and writes the result to a target table. We injected the entire .flw JSON of a similar flow into the prompt, along with source table column metadata and a description of the desired output. After a short wait, GPT-5 responded with a shiny new .flw file. We loaded it into SAS Studio. The query node had the exact columns we wanted, the filter logic was correct, and best of all, the flow actually ran! Magic? Not quite. Just the right model and the right prompt. But it sure feels like it. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Demo 2: Auto-Joining Fact and Dimension Tables Next, we cranked up the difficulty: build a flow that joins a fact table with a dimension table. We ask the model to figure out the join keys from the column metadata provided. We want all columns from the fact, only unique columns from the dimension, and outputting to a new table. Here’s what went into the prompt: The .flw code of an example flow that joins two tables, including node structure, swim lanes, and query logic. Clear column metadata for both source tables. Instructions for the LLM to deduce the join keys and to keep only unique columns from the dimension. After about 90 seconds (yes, this was a long prompt), GPT-5 generated a new .flw file. When we opened the flow, here’s what we found: The swim lane structure was intact, and both input tables were present. The query node had the expected join, correctly matching customer_rk from both tables. All columns from the fact table were included, as requested. All columns from the dimension table were also present, including those duplicated in the fact table. Not perfect, not exactly what we asked for, but technically correct. One minor hiccup: SAS Studio warned about duplicate columns after the join, but the data was accurate, and the flow executed successfully. How It Works: Under the Hood GPT-5 is more appropriate than the GPT-4 models for generating SAS Studio flows due to its significantly larger context window (up to 272,000 input tokens) and a very large output limit (up to 128,000 tokens). Curious about the mechanics? Here’s a quick summary of the Python program powering these demos. You can find the full program at the end of the post: It reads the system prompt (rules for the AI) and user prompt (the example .flw, requirements, and metadata) from text files. It loads Azure OpenAI credentials from a .env file. The Azure OpenAI client sends the combined prompt to GPT-5, with a high token limit and minimal verbosity for a focused, raw JSON output. The response (new .flw JSON) is written directly to a .flw file, ready to be opened into SAS Studio. No smoke, no mirrors, just good prompt engineering and the remarkable language modelling of GPT-5. Lessons Learned: AI Learns Fast, But You’re Still the Boss What do these demos reveal? Generative AI can generate working, valid SAS Studio Flows from examples, metadata, and requirements. The model deduced the correct join keys (customer_rk) from metadata alone, no handholding required. While the model sometimes included extra columns (or failed to drop duplicates), the results were generally usable and easy to tweak. Prompt engineering is critical. The more context you give, the better the results, especially if you provide sample .flw files and clear metadata. Put simply: AI is your tireless assistant, not your replacement. Your review and common sense are required. Critical Points While this approach is reliable for generating safe, similar flows, its “narrowness” can become a bottleneck for innovation or advanced automation. Probable issues: Overfitting: Using a detailed example can make the model mimic that flow, limiting creativity. Limited Generalization: If only one flow type is used, the model struggles to handle more varied or complex flows. Subset Bias: Generated flows may rely only on nodes present in the example, ignoring features not shown, even when required by instructions. Instruction Drift: If instructions differ from the example, the model might default to the example’s pattern instead of adapting. Positive Points For many organizations, especially those focused on repeatability and governance, this method is practical enough, robust, and well-suited to real-world SAS Studio flows generation. It’s a great way to bootstrap reliable flow generation, especially if you incrementally expand your set of prompt examples as needs evolve. On the positive side: High structural fidelity: Providing a real-world example ensures the generated .flw is almost always well-formed, schema-valid, and compatible with SAS Studio, reducing time lost to syntax or structural errors. Standardization: When you want flows to follow a corporate standard or template, this “anchoring” guarantees consistency across generated artifacts (great for compliance and maintainability). Prompt engineering simplicity: For new users or teams, the approach is easy to adopt, just swap in your example flow and metadata, and update your instructions. No deep LLM prompt tweaking required. Reduced Ambiguity for LLMs: The detailed, example-driven prompt leaves little room for the model to “hallucinate” invalid node types, property names, or relationships. You get predictable, robust outputs. Compatible with modern DevOps practices: By combining generative AI with Git, you can store each generated SAS Studio flow (.flw file) in version control. Trigger an automated job to test if the flow runs successfully. If it fails, simply rerun with a different prompt and generate a new candidate. Conclusion With the right setup and inputs (sample flows "injected" into the prompt, source table column metadata and clear instructions), GPT-5 can generate simple SAS Studio flows (simple, for now). Quickstart: AI-Generated SAS Studio Flows Ready to generate new flows? Here’s how to get started: Prepare Your Inputs Choose a sample flow at the right level of complexity. Collect column metadata for your input tables. You can run: %let mylib=sashelp; %let mytable=prdsale; /* Generate metadata for &mylib..&mytable and display key attributes */ proc contents data=&mylib..&mytable out=meta_temp noprint; run; proc sort data=meta_temp; by varnum; run; data meta_report; set meta_temp; Obs = _N_; keep Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT; run; proc print data=meta_report label noobs; var Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT; run; That will create something like: Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT 1 ACTUAL 1 8 Actual Sales DOLLAR 2 COUNTRY 2 15 Country $CHAR 3 DIVISION 2 15 Division $CHAR 4 MONTH 1 8 Month MONNAME 5 PREDICT 1 8 Predicted Sales DOLLAR 6 PRODTYPE 2 15 Product type $CHAR 7 PRODUCT 2 15 Product $CHAR 8 QUARTER 1 8 Quarter F 9 REGION 2 15 Region $CHAR 10 YEAR 1 8 Year F Write clear output instructions (e.g., target table name and location, filters, columns to keep). Instructions: Create a new SAS Studio flow, modeled on the examples provided. Flow name: Prdsale.flw Input table: SASHELP.PRDSALE Select columns: PRODUCT, ACTUAL Filter: REGION = 'WEST' and COUNTRY = 'U.S.A.' Output table: SASDM.PRDSALE_W Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs). Build Your Prompts User Prompt Create a user prompt (user_message file) by: Briefly describing the sample flow. Adding the raw .flw file. Describing source input table metadata. Providing your instructions. Example (Demo 1): Example flow: Flow name: Car_Make_SimpleQuery.flw Input table: SASHELP.CARS Select columns: All + one new calculated column: Diff New calculated column: "name": "Diff", "expressionText": "t1.MSRP - t1.Invoice" Filter: t1.Origin = 'Asia' Output table: SASDM.CARS_INFO Full Car_Make_SimpleQuery.flw file: --- { "creationTimeStamp": null, ... flw_file_content_here ... "stickyNotes": [] } --- Column metadata input table. SASHELP.PRDSALE: --- Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT 1 ACTUAL 1 8 Actual Sales DOLLAR 2 COUNTRY 2 15 Country $CHAR 3 DIVISION 2 15 Division $CHAR 4 MONTH 1 8 Month MONNAME 5 PREDICT 1 8 Predicted Sales DOLLAR 6 PRODTYPE 2 15 Product type $CHAR 7 PRODUCT 2 15 Product $CHAR 8 QUARTER 1 8 Quarter F 9 REGION 2 15 Region $CHAR 10 YEAR 1 8 Year F --- Instructions: Create a new SAS Studio flow, modeled on the examples provided. Flow name: Prdsale.flw Input table: SASHELP.PRDSALE Select columns: PRODUCT, ACTUAL Filter: REGION = 'WEST' and COUNTRY = 'U.S.A.' Output table: SASDM.PRDSALE_W Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs). Insert the full sample .flw file content between the marker '---'. For example: --- { flw_file_content_here }---. System Prompt Create a system prompt (system_message file) with output rules (keep it minimal): You are a SAS Studio Flows expert. When given an example .flw JSON and task requirements, generate a new SAS Studio flow as a single, complete .flw JSON that mirrors the example’s schemaVersion, node types, structure, and property names. Change only what’s necessary to meet the task. Use the minimum number of nodes required. Do not start a CAS session or upload to CAS unless explicitly requested. Use only columns present in the provided metadata. If an essential detail is missing, ask one concise clarifying question and wait. Output raw JSON only. No explanations, no markdown/code fences, no truncation. Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs). Set Up the Environment Create an Azure OpenAI resource. Deploy one GPT-5 model. Set up a .env file with your API keys (and add .gitignore to keep secrets safe). AZURE_OAI_ENDPOINT='https://myuser.openai.azure.com/' AZURE_OAI_KEY='key_here' #Azure Open AI key here AZURE_OAI_DEPLOYMENT='gpt-5' AZURE_OAI_API_VERSION = '2025-01-01-preview' Write Your Python Generation Program Use your Python program to load prompts, call GPT-5, and save the generated .flw file. # Import required libraries import os from dotenv import load_dotenv from openai import AzureOpenAI import time # Change to the directory where the .env file is stored # Replace this with the correct path if needed # os.chdir('/gelcontent/llm') # Prompt files system_messages_file = 'system_message.txt' user_messages_file = 'user_message.txt' output_file = 'generated_with_gpt5.flw' # Read LLM system message with open(system_messages_file, 'r', encoding="utf8") as file: system_message = file.read() with open(user_messages_file, 'r', encoding="utf8") as file: user_message = file.read() # Load Azure OpenAI credentials from the .env file load_dotenv() endpoint = os.getenv("AZURE_OAI_ENDPOINT") deployment = os.getenv("AZURE_OAI_DEPLOYMENT", "gpt-5") subscription_key = os.getenv("AZURE_OAI_KEY") api_version = os.getenv("AZURE_OAI_API_VERSION") # Initialize Azure OpenAI client with key-based authentication client = AzureOpenAI( azure_endpoint=endpoint, api_key=subscription_key, api_version = api_version ) # Prepare the chat prompt messages=[ {"role": "user", "content": user_message}, {"role": "system", "content": system_message} ] # Generate the completion # Send the request to Azure OpenAI try: start_time = time.time() # Record the start time completion = client.chat.completions.create( model=deployment, messages=messages, max_completion_tokens=33999, reasoning_effort="minimal", #verbosity="low", stop=None, stream=False ) end_time = time.time() # Record the end time elapsed_time = end_time - start_time # Print token usage print("Prompt tokens:", completion.usage.prompt_tokens) print("Completion tokens:", completion.usage.completion_tokens) print("Total tokens:", completion.usage.total_tokens) print(f"Time elapsed: {elapsed_time:.2f} seconds") # Extract the generated step file content flow_file_content = completion.choices[0].message.content # Write the content to the output file with open(output_file, 'w', encoding='utf-8') as f: f.write(flow_file_content) print(f"Flow file successfully written to: {output_file}") except Exception as e: print(f"An error occurred: {e}") Automate & Iterate Run your script, test the generated flow in SAS Studio, and tweak your prompts for better results. That’s it! Version your flows in Git, automate testing, and let AI do the heavy lifting. Give it a try, and share what you discover! Thanks for reading. If you liked this guide, give it a thumbs up! Now go and generate something amazing! For further guidance, reach out for assistance. Find more articles from SAS Global Enablement and Learning here.

Bogdan_Teleuca · ‎09-02-2025

@touwen_k The job flow 0 is another job flow (made of jobs, logical gates). The point of the example was to show that you can also trigger a whole job flow, not only jobs...

Bogdan_Teleuca · ‎09-02-2025

@__david__ I apologize to answer so late. Hm... By default, a connection from a job to another object (a job dependency) specifies that the connected object runs when the job it is connected to completes successfully. You can change the job-dependency criteria. See Modifying a Job Dependency. I would click on the connector (arrow) between the connected objects. Then modify the job dependency. Let me know if that helps.

Bogdan_Teleuca · ‎09-02-2025

@touwen_k that's the ultra secure version. Try it. Working on 8080 inside the cluster should be ok for most of the use cases. If you have ultra sensitive data and you don't want users with cluster access to potentially snoop it, working with 8443 should be the way to go forward. Bear in mind that when you open 8443, 8080 stays open. The doc explains how to close it.

Bogdan_Teleuca · ‎08-29-2025

Welcome back, SAS Agentic AI explorers! Today, we’re going forward with Agentic AI workflows in SAS Intelligent Decisioning—focusing on the build. We'll detail how you can integrate Large Language Models (LLMs), deterministic models, code files, and custom rules into a governed, modular SAS Agentic AI workflow. Where We Are in the Series In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR). In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure. In Part 3.1, SAS Agentic AI – Deploy and Score Models – Containers, we got our hands dirty deploying Azure Container Instances. In Part 3.2, SAS Agentic AI – Deploy and Score Models – Apps, we discovered Azure Container Apps and Web Apps for scalable, secure LLM deployments. In Part 3.3, Deploy and Score Models – Kubernetes, we deployed an open source LLM to the same Azure Kubernetes Service cluster as SAS Viya. Why SAS Agentic AI Workflows? Here’s a very quick overview of agentic AI workflows with SAS Viya: SAS Agentic AI workflows give you a governed platform where LLMs, traditional machine learning, rule sets, and human review steps all play together. The Call LLM node is a key piece—it lets you connect any LLM via API. All you need is the container’s URL and a payload with prompts and options. SAS Agentic AI Accelerator standardizes the payload. Flexibility is built-in: swap models or endpoints, run experiments with Prompt Builder, and publish workflows to containers or push them to production. SAS tracks each step with visual diagrams and versioning, so you always know which logic drove the decision. SAS Agentic AI Workflow Lifecycle Here’s the typical lifecycle for your Agentic AI workflow: Build: Assemble your decision workflow using SAS Intelligent Decisioning. Integrate LLMs (via Call LLM node), deterministic models, code files, and rule sets. To prep prompts, use either prompt models (from Prompt Builder) or code files. Publish: Publish your workflow as a container image. Deploy: Deploy it to your target cloud or on-prem environment (e.g., Azure Container Instance, Kubernetes, or VM). Score: Run scoring using a REST API, passing inputs (data and prompts) and retrieving LLM or model-driven decisions. Integrate: Connect your workflow to real-time systems or integrate in applications. How to Build an Agent? Let’s focus on the build with an example. Overview Suppose your goal is to support credit officers and make their job easier when assessing client credit requests. Their main pain point? They perform steps across multiple systems and lose time personalizing rejection emails. Templates exist, but personalization is still semi-automated. They’re considering LLMs—but messages must remain compliant and tone-appropriate. That’s where you come in. You’re setting up a workflow that automates communication with human review checkpoints. Using SAS Intelligent Decisioning, you can build an Agentic AI workflow: Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Deterministic Models Start with their trusted, governed, versioned model to determine credit approval or rejection based on verified inputs. Branches Next, branch the logic: one for approval messages, one for rejection. Each branch uses different LLMs and prompts. In this context, prompts act as templates that drive the message structure, the tone. LLMs On the rejection branch, the credit team uses an open-source model hosted in their infrastructure. The Call LLM node is key—it connects any LLM via API. Just provide the container’s URL llmURL and a payload llmBody with prompts and options. How Do You Specify Prompts? With SAS, your LLM prompts can come from: Models: The result of Prompt Builder experiments using the LLM Portal Builder. You version your best prompt and manifest it as a portable model, ideal for governed, repeatable experiments. Prompt Builder streamlines prompt development from experimentation to deployment. With project organization, experiment tracking, and integration, teams can confidently develop, test, and operationalize LLM prompts in their business processes. We will discuss the Prompt Builder in a future post. Code File: A Python script that defines your prompt inline. Quick and direct, great for prototyping. You can swap it out later for a full model. Either way, connect them to the Call LLM node, which handles the API call to your deployed LLM endpoint. See deploy the LLM to a Private Azure Container Instance, or deploy the LLM to a Kubernetes pod, the choice is in your hands. Flexibility is built-in: swap models or endpoints, run experiments, and push workflows to production. SAS tracks each step with visual diagrams and versioning, so you always know which logic drove the decision. Evaluate Output Would you blindly trust the LLM output? No. Especially when the outcome is sensitive or could damage the client relationship. You could use a SAS sentiment analysis model to evaluate the LLM-generated message and make the user’s job easier. Human in the Loop Depending on the detected sentiment, add rule sets to determine whether human review is needed, or if the message can be sent as-is. Test Before You Deploy Before going live, score records and review LLM outputs to ensure they meet business requirements. This gives you full control over quality and output, no surprises in production. Visual Path Tracking: See What’s Happening SAS Intelligent Decisioning provides visual diagrams that show how records flow through your workflow. You’ll see: Which LLM processed each case. Where automation succeeded. Where human review was triggered. It’s transparency and governance, built in. Summary SAS Intelligent Decisioning lets you build Agentic AI workflows that combine models, rules, and LLMs. Modular design supports rule sets, prompt engineering, and flexible deployment. The Prompt Builder user interface allows you to save your most successful prompt engineering experiment as a model. The Call LLM node makes it easy to integrate any language model via API. The node is part of the SAS Agentic AI Accelerator and tailored to work with models registered from the SAS Model Manager LLM Model Project. Workflows are fully testable and traceable—ready for production. Discussion SAS is well-positioned to be a significant player in the Agentic AI space. The SAS Viya platform is already trusted by a wide range of companies and users, providing a solid foundation to build upon. In my view, our greatest potential lies in developing workflows specifically tailored to address distinct customer business problems and solving them exceptionally well. While generative AI models are widely accessible, SAS’s value is in being model agnostic, integrating these models within our robust, trusted platform. This allows customers to leverage cutting-edge AI in their existing environments, alongside traditional Machine Learning models developed over the years. What Should You Do Next? Stay tuned for upcoming posts detailing SAS Agentic AI Workflow Lifecycle, about publishing, deploying, scoring and integrating Agentic AI workflows in enterprise applications. Share your workflow stories, experiments, and best tips in the comments below! Try deploying LLMs and building an Agentic AI workflow using the recently updated Agentic AI – How to with SAS® Viya® workshop. If you liked this guide, give it a thumbs up! Acknowledgment Thanks to David Weik and Xin Ru Lee for sharing their time and resources. Additional Resources SAS Agentic AI Accelerator – Register and Publish Models. SAS Agentic AI – Deploy and Score Models – The Big Picture. SAS Agentic AI – Deploy and Score Models – Containers. SAS Agentic AI – Deploy and Score Models – Apps. SAS Agentic AI - Deploy and Score Models – Kubernetes. SAS Container Runtime – SAS Documentation. Want More Hands-On Guidance? SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure. Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows. For further guidance, reach out for assistance. Find more articles from SAS Global Enablement and Learning here.

Bogdan_Teleuca · ‎07-30-2025

Welcome back to the SAS Agentic AI Accelerator series! We’ve already cooked up LLM deployments with Docker and Azure’s managed services. Now, it’s time to turn up the heat with Kubernetes—the espresso machine of the cloud world. Sure, it has a few extra knobs and steam valves, but it gives you barista-level control. If you crave fine-tuned control, serious scalability, and rock-solid HTTPS security, Kubernetes is your playground. Let’s roll up our sleeves and get an LLM running—with plenty of focus on keeping it secure and scalable. For simpler setups, Azure’s managed options work great, but for ultimate power and flexibility, Kubernetes is where magic happens! Where We Are In The Series In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR). In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure. In Part 3.1, SAS Agentic AI – Deploy and Score Models – Containers, we got our hands dirty deploying Azure Container Instances. In Part 3.2, SAS Agentic AI – Deploy and Score Models – Apps, we discovered Azure Container Apps and Web Apps for scalable, secure LLM deployments. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. TLS Certificates Briefly In our example, we’ll securely deploy an LLM (the open-source Qwen-25-05b LLM by Alibaba Cloud) behind an HTTPS endpoint on Kubernetes. Why HTTPS? Because you and your security officer will both sleep better at night. You need a TLS certificate for HTTPS endpoints. Think of it as a VIP badge for secure web traffic. Here’s the concise version: Generate a private key and certificate signing request (CSR). Get the CSR signed by your internal or trusted certificate authority (CA). Combine the certificate and full chain. Load this into your Linux trust store (so tools like curl trust it). Create a Kubernetes secret from the key and certificate. # Set up secrets directory secrets_dir=~/project/deploy/models/secrets mkdir -p "$secrets_dir" && cd "$secrets_dir" # Variables RG=resource_group INGRESS_SAN="${RG}.gelenable.sas.com" # SAS Viya URL or LLM deployment DNS GELEnvRootCA=my_folder # location of certificates and private key required for signing # Generate private key and CSR openssl req -newkey rsa:2048 -sha256 -nodes -keyout scr_key.pem -extensions v3_ca \ -config <(echo "[req]"; echo "distinguished_name=req"; echo "[v3_ca]"; \ echo "extendedKeyUsage=serverAuth"; \ echo "subjectAltName=DNS:${INGRESS_SAN}, DNS:*.${INGRESS_SAN}") \ -subj "/C=US/ST=NC/L=North Carolina/O=SAS/CN=${INGRESS_SAN}" \ -out scr_models.csr # Sign CSR with Intermediate CA # These options tell OpenSSL to use the Intermediate CA's certificate and private key to sign the new certificate, rather than creating a self-signed certificate. echo "01" > scr_models.srl openssl x509 -req -sha256 -extensions v3_ca \ -extfile <(echo "[v3_ca]"; echo "extendedKeyUsage=serverAuth"; \ echo "subjectAltName=DNS:${INGRESS_SAN}, DNS:*.${INGRESS_SAN}") \ -days 820 -in scr_models.csr \ -CA $GELEnvRootCA/intermediate.cert.pem \ -CAkey $GELEnvRootCA/intermediate.key.pem \ -CAserial scr_models.srl -out scr_cert.pem # Append full certificate chain cat $GELEnvRootCA/intermediate.cert.pem >> scr_cert.pem cat $GELEnvRootCA/ca_cert.pem >> scr_cert.pem # Remove temporary files rm scr_models.* # Optional: Review the certificate openssl x509 -text -noout -in scr_cert.pem # Trust the CA certificate system-wide (for cURL etc.) sudo cp $GELEnvRootCA/ca_cert.pem /etc/pki/ca-trust/source/anchors/ sudo update-ca-trust The above block assumes you have access to intermediate CA's certificate and private key to sign the new certificate, rather than creating a self-signed certificate. For production, always use certificates signed by a trusted public Certificate Authority (CA), such as Let's Encrypt, DigiCert, or your organization's enterprise CA. This ensures secure, trusted, and verifiable connections for all clients. That’s it, no need to get lost in a cryptographic jungle. I am simply reproducing a very reliable "TLS jungle trekking guide" produced by our SAS colleague, @MichaelGoddard. @StuartRogers is an authoritative source on TLS for SAS Viya and has plenty of trustworthy articles on SAS Communities Prepare Your Kubernetes Cluster Clean Up and Create a Namespace Clear any coffee spills and set up a clean playground for your models: kubectl delete ns llm kubectl create ns llm Add a Dedicated Node Pool Large Language Models (LLMs) can be quite resource hungry. Open-source LLMs need lots of storage for model files, plus plenty of CPU and memory for processing. To keep everything running smoothly (and avoid stepping on other workloads’ toes), it’s best to give your LLMs their own dedicated node pool. Remember: choose the size of your node pool carefully, based on the specific LLMs you want to deploy and their technical requirements. az aks nodepool add \ --resource-group $RG \ --cluster-name $AKS_NAME \ --name llmnp \ --node-count 1 \ --node-vm-size Standard_D16as_v5 \ --max-count 1 \ --min-count 0 \ --enable-cluster-autoscaler \ --node-taints workload=llm:NoSchedule \ --labels workload=llm node.kubernetes.io/name=llm workload/class=models Check that your node is ready and properly labeled: kubectl get nodes --show-labels You should see labels like workload=llm and node.kubernetes.io/name=llm Think of these node labels as 'Reserved for LLMs' parking spots. Deploy the LLM to Your Kubernetes Cluster Add Your TLS Secret Load your certificate and key into Kubernetes as a secret: kubectl -n llm create secret tls scr-certificate \ --key="scr_key.pem" \ --cert="scr_cert.pem" # Check it’s there kubectl -n llm get secrets kubectl -n $NS describe secret scr-certificate Create Your Deployment YAML The Big Three: Pod, Service, and Ingress (What Do They Do?) Pod Think of a pod as the smallest shipping box in Kubernetes. Inside that box is your running application, in our case, the containerized LLM model. The pod wraps it up with the resources, environment variables, and storage it needs. If the pod isn’t running, your LLM isn’t either. Service A service is like the shipping label on the box. It makes sure traffic can find and reach your pod, even if the pod moves around, inside the cluster. In our YAML manifest, the service listens on port 443 (HTTPS) and forwards traffic to your LLM’s container, running inside the pod. Ingress Ingress is the front desk or receptionist of your Kubernetes office building. It’s the entry point for outside traffic. Ingress decides which service gets what request, handles HTTPS/TLS, and acts as a secure gateway from the internet to your application. YAML # Variables RG=Resource_group INGRESS_HOST=SAS_Viya_Ingress echo $INGRESS_HOST az login ACR_NAME=Your_Azure_Container_Registry # LLM image must be stored here as a container image az acr login --name $ACR_NAME # LLM name LLM=qwen_25_05b LLMDASH=${LLM//_/-} echo $LLM & echo $LLMDASH # Create the deployment YAML file tee ~/project/deploy/models/${LLMDASH}-tls-deployment.yaml > /dev/null <<EOF # ${LLMDASH} model deployment --- apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/name: ${LLMDASH} workload/class: models name: ${LLMDASH} spec: # modify replicas to support the requirements replicas: 1 selector: matchLabels: app.kubernetes.io/name: ${LLMDASH} template: metadata: labels: app: ${LLMDASH} app.kubernetes.io/name: ${LLMDASH} workload/class: models spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.azure.com/mode operator: NotIn values: - system - key: node.kubernetes.io/name operator: In values: - llm containers: - name: ${LLMDASH} image: ${ACR_NAME}.azurecr.io/${LLM}:latest imagePullPolicy: Always # IfNotPresent or Always resources: requests: # Minimum amount of resources requested cpu: 1 memory: 8Gi limits: # Maximum amount of resources requested cpu: 4 memory: 16Gi ports: - containerPort: 8080 name: http # Name the port "http" - containerPort: 8443 name: https # Name the port "https" env: - name: SAS_SCR_SSL_ENABLED value: "true" - name: SAS_SCR_SSL_CERTIFICATE value: /secrets/tls.crt - name: SAS_SCR_SSL_KEY value: /secrets/tls.key - name: SAS_SCR_LOG_LEVEL_SCR_IO value: TRACE volumeMounts: - name: tls mountPath: /secrets volumes: - name: tls secret: secretName: scr-certificate items: # Explicitly define the keys to mount - key: tls.crt path: tls.crt - key: tls.key path: tls.key tolerations: - key: workload/class operator: Equal value: models effect: NoSchedule - key: workload operator: Equal value: llm effect: NoSchedule --- # TLS service definition apiVersion: v1 kind: Service metadata: name: ${LLMDASH}-tls-svc labels: app.kubernetes.io/name: ${LLMDASH}-tls-svc spec: selector: app.kubernetes.io/name: ${LLMDASH} workload/class: models ports: - name: ${LLMDASH}-https port: 443 protocol: TCP targetPort: 8080 type: ClusterIP --- # TLS ingress definition apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ${LLMDASH}-ingress annotations: nginx.ingress.kubernetes.io/backend-protocol: HTTP labels: app.kubernetes.io/name: ${LLMDASH}-ingress spec: ingressClassName: nginx tls: - hosts: - ${INGRESS_HOST} secretName: scr-certificate rules: - host: ${INGRESS_HOST} http: paths: - path: /${LLM} pathType: Prefix backend: service: name: ${LLMDASH}-tls-svc port: number: 443 EOF What’s Happening in the YAML? You’ll see three main sections in the YAML file. Here’s what each one does: Deployment Spins up your LLM as a container inside a pod. Makes sure it runs on a special node reserved for LLMs (using labels and taints). Mounts the TLS certificate and key (so your app can do HTTPS). Sets environment variables to tell SAS Container Runtime (SCR) where to find the TLS certificates and how to behave. Requests and limits resources (CPU and memory) so your LLM has enough “brainpower” to run but can’t block the whole cluster. Service Exposes your pod inside the cluster on port 443. Acts as a stable “in-cluster” address for your LLM, so other components (like ingress) can always find it, even if pods are replaced or moved. Ingress Sets up a public HTTPS endpoint using your DNS name and TLS certificate. Routes any incoming request like the service below which then sends it to your LLM pod. https://your-dns/qwen_25_05b Uses annotations to tell the NGINX ingress controller to expect HTTP traffic behind the scenes, even though users connect over HTTPS. Apply and Go! Deploy your model: # Deploy kubectl apply -f qwen-25-05b-tls-deployment.yaml -n llm # Wait for the pod to be ready (watch for the “Ready” status): kubectl get pods -n llm # Check logs kubectl logs -n llm Score the LLM With everything live, you can send HTTPS requests to your Kubernetes ingress endpoint and watch your LLM do its magic. curl --location --request POST "https://${INGRESS_HOST}/qwen_25_05b" \ --header 'Content-Type: application/json' \ --header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \ --data-raw '{ "inputs": [ {"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"}, {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with customer_name, loan_amount, customer_language. Follow the guidelines for a professional, friendly response."}, {"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"} ] }' | jq If you get a smart response, such as the following sample, congratulations! You’ve just deployed a secure, scalable LLM using Kubernetes. Performance and Scaling Notes Large Language Models (LLMs) are heavy weightlifters. They need generous CPU, memory, and storage, especially when running open-source versions. For best results, give LLMs their own dedicated node pool (or multiple node pools). This ensures your models won’t compete for resources with other workloads, keeping everything running smoothly. When it comes to scaling, Kubernetes shines. You can adjust the number and size of nodes in your pool to match your workload. Just remember: the bigger the LLM, the beefier your node needs to be. Choose your node pool size based on the technical requirements of your models, don’t try to squeeze a heavyweight model into a tiny node! For ultra-responsive performance, monitor CPU and memory usage and scale up as needed. And if you’re aiming for production-grade speed, keep an eye on response times as you adjust resources. Security Corner Security isn’t just an add-on—it’s essential. Always use HTTPS to protect data in transit. This means securing both your public endpoints and the internal traffic between your ingress, service, and pod. For extra peace of mind, forward traffic from the ingress to your pod over port 8443 (HTTPS), not just 8080 (HTTP). Make sure: Your container exposes containerPort: 8443 in the YAML. Your ingress annotation is set to nginx.ingress.kubernetes.io/backend-protocol: HTTPS. Certificates are properly managed, and secrets are stored securely in Kubernetes. Recommendations Always give your LLMs a dedicated node pool, sized appropriately for their needs. This avoids resource conflicts and keeps things running smoothly. If it feels like your LLM is trying to eat your entire cluster, it probably is. Time to beef up those nodes. Watch your CPU, memory, and response times. Scale up resources as needed and adjust node pool sizes to meet demand. Use HTTPS end-to-end, not just at the edge. Enable secure backend communication by forwarding traffic on port 8443 and setting the right ingress annotations. For anything beyond a quick test, secure your endpoints, manage certificates properly, and keep secrets safe. Summary Deploying LLMs in Kubernetes gives you flexibility, scalability, and strong security, if you set things up right. With these best practices in place, your LLMs will run smoothly, securely, and ready for whatever comes next. And remember: in the world of Kubernetes, a little resource planning goes a long way. Happy deploying! Thanks for following along! If you find this post helpful, give it a thumbs up, share your stories or questions in the comments, and let’s keep building better AI workflows together. Stay tuned for more! Acknowledgment Thanks to @MichaelGoddard for sharing his time and resources. Additional Resources SAS Agentic AI Accelerator – Register and Publish Models. SAS Agentic AI – Deploy and Score Models – The Big Picture. SAS Agentic AI – Deploy and Score Models – Containers. SAS Agentic AI – Deploy and Score Models – Apps SAS Container Runtime – SAS Documentation. Want More Hands-On Guidance? SAS offers a full workshop in the SAS Decisioning Learning Subscription with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure. Access it on learn.sas.com. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows. For further guidance, reach out for assistance. Find more articles from SAS Global Enablement and Learning here.

Bogdan_Teleuca · ‎07-18-2025

Welcome back to the SAS Agentic AI Accelerator series! If you’ve followed the journey so far—through registration, publishing, and container deployments—then you’re ready for the next stop: deploying your Large Language Models (LLMs) in Azure using Container Apps and Web Apps. If deploying with Docker containers felt like making espresso, get ready for an espresso macchiato. A bit more involved (and maybe some extra foam). Let’s get brewing! Where We Are In The Series In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR). In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure. In Part 3.1, SAS Agentic AI – Deploy and Score Models – Containers, we got our hands dirty deploying Azure Container Instances. Part 3.2 (this post), we’ll discover Azure Container Apps and Web Apps for scalable, secure LLM deployments. Example We’ll deploy the open-source Qwen-25-05b LLM (by Alibaba Cloud), included as a code-wrapper in the SAS Agentic AI Accelerator repo. “Qwen” is the short for 'Tongyi Qianwen' in Chinese, meaning “comprehensive understanding of a thousand questions.” The “05b” means 500 million parameters—think of them as brain cells. Code-wrappers come with the SAS Agentic AI Accelerator code repository. Code-wrappers standardize LLM inputs and outputs, can be switched easily into agentic AI workflows and are easy to deploy, thanks to SCR. Deploying Azure Container Apps Set Up Your Environment First, make sure you have the Azure CLI (Command Line Interface) with the containerapp extension: az extension add --name containerapp --upgrade Define Your Variables # Variables to set CONT="my-qwen-app" # Name/DNS label for your app RG="my-resource-group" # Azure Resource Group ACR_NAME="myacr" # Azure Container Registry name IMAGE="qwen_25_05b" # Image name IMAGE_TAG="latest" # Image tag/version ACR_PASS=$(az acr credential show -n $ACR_NAME --query "passwords[0].value" -o tsv)# ACR password or service principal LOCATION="westus3" # Azure region – choose one that suits you ENVIRONMENT="llms" # Azure Container App environment You’re defining basic settings—like the names of your Azure Container Registry, resource group, container app, and image. You’re also grabbing your registry password securely. Create the Container Apps Environment az containerapp env create --name $ENVIRONMENT --resource-group $RG --location $LOCATION This creates a secure environment in Azure where your container apps will live. Think of it as making a “home” for your deployments. Deploy Your LLM as a Container App az containerapp create \ --name $CONT \ --resource-group $RG \ --cpu 4.0 \ --memory 8.0Gi \ --environment $ENVIRONMENT \ --registry-server ${ACR_NAME}.azurecr.io \ --registry-username $ACR_NAME \ --registry-password $ACR_PASS \ --image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \ --target-port 8080 \ --ingress external \ --query properties.configuration.ingress.fqdn This command launches your LLM model as a container app in Azure. It connects to your Docker image (containing SAS Container Runtime code), gives it CPU and memory, and makes it accessible over the web (via HTTPS). Check If It’s Alive az containerapp list --resource-group $RG --output table # Or to find your specific app az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv This is listing the deployed apps and retrieving the Fully Qualified Domain Name (FQDN) of your app, which we’ll use during scoring. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Scoring the Container App Azure Container Apps give you HTTPS endpoints out of the box, auto-scaling, and a slick way to isolate your resources. Think of this as your go-to for agile, event-driven workloads and secure external access. It’s a good compromise between Container Instances and Kubernetes Cluster. Now, let’s wake up your deployed LLM: FQDN=$(az containerapp show --name $CONT --resource-group $RG --query properties.configuration.ingress.fqdn -o tsv); echo $FQDN curl --location --request POST "https://${FQDN}/${IMAGE}" \ --header 'Content-Type: application/json' \ --header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \ --data-raw '{ "inputs": [ {"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"}, {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with customer_name, loan_amount, customer_language. Follow the guidelines for a professional, friendly response."}, {"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"} ] }' | jq This curl command sends a request to your deployed LLM, providing text and instructions. The LLM reads your input, processes it, and returns a response (such as a drafted email). Scaling Up: The Easy Way Need to handle more requests? Azure Container Apps make it effortless: az containerapp update \ --name $CONT \ --resource-group $RG \ --min-replicas 1 \ --max-replicas 10 \ --scale-rule-name http-scale \ --scale-rule-http-concurrency 1 You’re telling Azure to automatically add more app instances if traffic increases (auto-scaling). This helps your LLM handle multiple requests at once. Deploying to Azure Web Apps Azure Web Apps are a managed way to host your containerized LLMs. These shine when you need robust hosting for production APIs, HTTPS, deployment slots, and more “set-it-and-forget-it” operations. Set Up Your Web App First, create an App Service Plan (this controls the steam pressure of your espresso machine). Second, create the Web App for Containers. Third, set the SCR scoring port: 8080 or 8443 for TLS. Lastly, get your app’s endpoint to start scoring. # Create an App Service Plan az appservice plan create \ --name my-llm-plan \ --resource-group $RG \ --location $LOCATION \ --sku P1mv3 \ --is-linux # Create the Web App for Containers: az webapp create \ --resource-group $RG \ --plan my-llm-plan \ --name my-qwen-app \ --container-image-name ${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG \ --container-registry-user $ACR_NAME \ --container-registry-password "$ACR_PASS" \ --https-only true # Set the port az webapp config appsettings set \ --name my-qwen-app \ --resource-group $RG \ --settings WEBSITES_PORT=8080 #Get your app’s endpoint APP_FQDN=$(az webapp show --resource-group $RG --name my-qwen-app --query defaultHostName -o tsv) Scoring the Web App The scoring call is nearly identical to the Azure Container Apps: curl --location --request POST "https://${APP_FQDN}/${IMAGE}" \ --header 'Content-Type: application/json' \ --header 'Accept: application/vnd.sas.microanalytic.module.step.output+json' \ --data-raw '{ "inputs": [ {"name":"userPrompt","value":"customer_name: Xin Little; loan_amount: 20000.0; customer_language: EN"}, {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank."}, {"name":"options","value":"{temperature:1,top_p:1,max_tokens:800}"} ] }' | jq The response times, however, couldn’t be more different. Performance and Scaling Notes I did not find the Azure Web Apps to perform well for scoring open source LLMs. Azure Web Apps are great; they might be suitable for proprietary LLM code wrappers where the deployed image is light. With open source LLMs that download gigabytes of data to a container, they don't really shine. I prefer the Azure Container Apps for cost and response time, for now. Container Apps: Fast deployment, auto-scaling, and HTTPS. Great for lightweight/medium models and variable workloads. Web Apps: Managed, reliable API hosting—good for high-availability, but expect higher costs and slower responses for large LLM images unless you scale up resources. Example: 2 CPU, 16 GB: ~90s/response 8 CPU, 64 GB: ~25s 16 CPU: ~15s 32 CPU: ~10s Security Corner Don’t skip this: your data (and your Security Officer) will thank you. Always use HTTPS for requests. Protect sensitive data and ACR credentials. For sensitive/internal workloads: use VNET, private endpoints, or internal-only ingress. Regularly check logs: Container Apps: az containerapp logs show --name $CONT --resource-group $RG Web Apps: az webapp log download --name my-qwen-app --resource-group $RG Recommendations For quick tests and demos, use Container Apps with public HTTPS—fast, simple, and safe for non-sensitive data. For internal production needs, use VNET (Virtual Network) integration or internal ingress, and plan scaling based on your LLM’s requirements. For robust, high-availability APIs, Web Apps deliver managed hosting and scaling but be prepared to adjust resources and costs. Monitor and tune performance: More CPUs and RAM usually help, but test and adjust for best results. Container Apps have a 2 CPU, 8GB RAM cap. Summary Azure Container Apps and Web Apps both give you flexible, secure options to deploy SAS Agentic AI Accelerator LLMs. Container Apps are ideal for experimentation and lighter workloads; Web Apps offer managed, stable hosting for API-heavy use cases, though faster responses come with a higher price tag. Choose based on your needs, balance security and cost, and don’t forget to experiment and optimize as you go. Thanks for following along! If you find this post helpful, give it a thumbs up, share your stories or questions in the comments, and let’s keep building better AI workflows together. Stay tuned for more! Additional Resources SAS Agentic AI Accelerator – Register and Publish Models. SAS Agentic AI – Deploy and Score Models – The Big Picture. SAS Agentic AI – Deploy and Score Models – Containers. SAS Container Runtime – SAS Documentation. Want More Hands-On Guidance? SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure. Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows. For further guidance, reach out for assistance. Find more articles from SAS Global Enablement and Learning here.

Bogdan_Teleuca · ‎07-09-2025

it's great, thanks!

Bogdan_Teleuca · ‎07-09-2025

super useful, merci!

Bogdan_Teleuca · ‎07-09-2025

Welcome back to the SAS Agentic AI Accelerator series! If you’ve made it this far in the series, you’ve survived the high-level overviews and cost comparisons—give yourself a pat on the back (or at least a fresh cup of coffee). Now, let’s roll up our sleeves and actually deploy something.; Ever tried to deploy a Large Language Model (LLM) and felt like you were assembling IKEA furniture with missing instructions? Today, I’ll walk you through the nuts and bolts—so you can get your models running in Azure with minimal head-scratching. Today, we’re going to take all that theory and put it into practice: deploying a code-wrapped LLM as a container in Azure. We’ll start simple (public IP), then get secure (private IP), and make sure you know what to watch out for at every step. Where We Are In The Series In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how you can register them in SAS Model Manager, then how to publish them as Docker images using SAS Container Runtime (SCR). In Part 2, SAS Agentic AI – Deploy and Score Models – The Big Picture, we compared deployment options, costs, and performance trade-offs in Azure. In Part 3.1, (that’s this post), let’s get hands-on: actual deployment and scoring scripts, with extra tips on security. Example As an example, let’s walk through deploying an open source model: the phi-3-mini-4k LLM code wrapper. Code-wrappers come with the SAS Agentic AI Accelerator code repository. Code-wrappers standardize LLM inputs and outputs, can be switched easily in agentic AI workflows and are easy to deploy, thanks to SCR. This phi-3-mini-4k model, developed and released by Microsoft, is a lightweight large language model designed for efficiency and quick responses—think of it as a compact, agile AI that doesn’t need a supercomputer to run. If you’re wondering about the quirky name “phi-3-mini-4k,” you’re not alone—it sounds like it could be R2-D2’s distant cousin from the Star Wars universe! There’s just something about AI and robotics that inspires these metallic, alphanumeric names. Maybe it’s a subtle nod to our sci-fi dreams, or perhaps it’s just because “Bob the Bot” doesn’t sound as futuristic or impressive. Either way, let’s see how to get our own “phi” up and running in the cloud—no droids, no bots required! Deploy to Azure Container Instances (Public IP) In Azure, that’s the fastest way to test or demo your LLM. It’s not meant for production or anything sensitive. Michael Goddard wrote about Deploying SAS Container Runtime models on Azure Container Instances. So far the code-wrapped LLM follow the same guidelines. Deployment Script For Azure deployment scripts, you can use the Azure Command Line Interface (CLI). # Variables to set CONT="myprefix-phi" # Name/DNS label for your container RG="my-resource-group" # Azure Resource Group ACR_NAME="myacr" # Azure Container Registry name IMAGE="phi_3_mini_4k" # Image name IMAGE_TAG="latest" # Image tag/version ACR_PASS=$(az acr credential show -n $ACR_NAME --query "passwords[0].value" -o tsv)# ACR password or service principal LOCATION="westus3" # Azure region – choose one that suits you az container create -n $CONT -g $RG \ --image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \ --registry-username $ACR_NAME \ --registry-password $ACR_PASS \ --ports 80 8080 \ --protocol TCP \ --dns-name-label $CONT \ --location $LOCATION \ --cpu 4 \ --memory 16 What’s Happening? You’re spinning up a container with your LLM, exposing ports for API access (SCR needs 8080), and giving it enough juice (4 CPUs, 16 GB RAM) to keep things snappy. But it’s public! Anyone with the endpoint can poke your model. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Scoring Script Time to see if your LLM is awake! Here’s a sample curl command to send a scoring request: # Variables curl -X POST "http://${CONT}.${LOCATION}.azurecontainer.io:8080/${IMAGE}" \ -H 'Content-Type: application/json' \ -d '{ "inputs": [ {"name":"userPrompt","value":"customer_name: X Y; loan_amount: 20000.0; customer_language: EN"}, {"name":"systemPrompt","value":"You are tasked with drafting an email to respond to a customer whose mortgage loan application has been accepted by the SAS AI Bank. You will be provided with three pieces of information: customer_name, loan_amount, customer_language. Use the provided customer name and loan amount to personalize the email."}, {"name":"options", "value":"{temperature:0.7,top_p:1,max_tokens:800}"} ] }' | jq What’s Going On Here? userPrompt: The data you want your LLM to use (customer name, loan amount, etc.). systemPrompt: The “instructions manual” for your LLM. Think of it like setting your model’s GPS—so it doesn’t drive off into the weeds and start talking about quantum computing when you just wanted an email template. options: Controls for creativity, length, and other LLM behaviors. These three inputs are set by the LLM code wrapper from the SAS Agentic AI Accelerator. Deploy with Private IP (VNET Integration) Suitable for internal, secure deployments—perfect for real workflows where you care about data privacy. Deployment Script # New Variables CONTP="myprefix-phi-private" # Name/DNS label for your container VNET="SAS-Viya-azure-vnet" # Virtual Network name SUBNET="llm-subnet" # Subnet name # Step 1: Create a dedicated subnet for containers within your existing VNET az network vnet subnet create \ --resource-group $RG \ --vnet-name $VNET \ --name $SUBNET \ --address-prefix 192.168.3.0/26 \ # adapt it to match your VNET ip range --delegations Microsoft.ContainerInstance/containerGroups # Deploy the container with a private IP az container create \ --resource-group $RG \ --name $CONTP \ --image "${ACR_NAME}.azurecr.io/$IMAGE:$IMAGE_TAG" \ --registry-username $ACR_NAME \ --registry-password $ACR_PASS \ --ports 80 8080 \ --protocol TCP \ --location $LOCATION \ --vnet $VNET \ --subnet $SUBNET \ --ip-address private \ --cpu 4 \ --memory 16 # Retrieve the Private IP of your container az container show --resource-group $RG --name $CONTP --query "ipAddress.ip" --output tsv Make sure your SAS Viya and the deployed containers subnet are on the same VNET! Otherwise, your scoring requests will be like postcards sent to a house with no mailbox. Scoring Script (Private IP Version) The only thing changing in the scoring is the usage of a container IP instead of a DNS label (FQDN) for the public IP container: CONT_IP=$(az container show --resource-group $RG --name $CONTP --query "ipAddress.ip" --output tsv) echo "CONT_IP=${CONT_IP}" curl -X POST "http://${CONT_IP}:8080/${IMAGE}" ... It’s just the same scoring request as before, just use the container’s private IP address instead of the container’s DNS from the public example. Security Corner Don’t send PII (that stands for Personally Identifiable Information) over public endpoints. Ever. (Seriously. Somebody might be listening to your traffic.) No HTTPS by default: Traffic isn’t encrypted—even inside a VNET. For extra-sensitive data, put additional controls in place (private endpoints, network security groups, etc.). Lock down your ACR credentials: Treat them like your Netflix password. Or better. Summary Public IP deployments are fast for testing, but risky for anything sensitive. VNET-integrated (private) deployments are more acceptable for real-world use. Security matters—enforce it from the start. Troubleshooting is part of the journey. Don’t let the first error stop you. Prompt engineering is your secret weapon for great LLM responses. What Should You Do Next? Test out both deployment methods—see which fits your needs. Experiment with prompt engineering to get the best responses from your LLM. Share your best (or worst!) LLM deployment stories in the comments below. If it’s an embarrassing (technical, please) story you can use the Anonymous option. Read the next post in the series (coming soon). Additional Resources SAS Agentic AI Accelerator – Register and Publish Models. SAS Agentic AI – Deploy and Score Models – The Big Picture. SAS Container Runtime – SAS Documentation. Want More Hands-On Guidance? SAS offers a full workshop with step-by-step exercises for deploying and scoring models using Agentic AI and SAS Viya on Azure. Access it on learn.sas.com in the SAS Decisioning Learning Subscription. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows. If you liked the post, give it a thumbs up! Please comment and tell us what you think about the AI Decisioning Assistant. For further guidance, reach out for assistance. Let us know how this solution works for you! Find more articles from SAS Global Enablement and Learning here.

Bogdan_Teleuca · ‎07-09-2025

@Meijonk @Woitake Azure new VPN diagnostic tool: thanks for the tip! haven't tried it lately, but sounds like a good debugging tool. Routes can be tricky I fully agree.

Bogdan_Teleuca · ‎06-17-2025

Welcome back to the SAS Agentic AI Accelerator series! Today we’ll explore how to deploy and score code-wrapped Large Language Models (LLMs) in Azure, then call them from Agentic AI workflows inside SAS Viya. To keep things clear, the topic is split into two parts: The Big Picture – a high-level overview with a short video and comparison tables that help you choose a deployment method. Azure is our example cloud. The Nitty-Gritty – a hands-on guide with deployment and scoring scripts. Where We Are In The Series In Part 1, Register and Publish Models, we introduced code-wrapped LLMs and showed how to publish them with the SAS Container Runtime (SCR). The end result was Docker images in a container registry. Part 2 — this post — covers the deployment options. Deployment and Scoring Overview After registering and publishing an LLM code wrapper, you can deploy it as a Docker image in various environments. Once deployed, you can score using the SAS Container Runtime API. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Deployment Options Here’s a quick overview of the deployment options in the Azure cloud. That’s not an exhaustive list. This list reflects only what I tested: Deployment Option Use Case Scalability Ease of Setup Security Azure Container Instances Lightweight, quick starts Low Simple Public or private IP (HTTP only) Azure Container Apps Event-driven, auto-scaling Medium Managed Public IP (HTTPS) Azure Web Apps Managed containers hosting Medium Managed Public IP (HTTPS) Kubernetes Pods Large-scale, fully orchestrated High Complex (requires YAML, must manage node resources) Flexible: Private / public IP (HTTP or HTTPS) Containers on Virtual Machines Legacy or custom configurations Medium Moderate Flexible (private/public) Key Considerations for Each Deployment Option As you can see, there are so many options available in the Azure cloud. The following table should help you choose the one for you needs. Deployment Option Feasibility Advantages Limitations Azure Container Instances Ideal for small open source LLMs like phi-3-mini-4k (4 vCPUs, 16 GB RAM). Easiest to launch. Limited to 4 CPUs, 16 GB RAM. Azure Container Apps Middle ground between Container Instances and Kubernetes clusters. Built-in HTTPS ingress and auto-scale. Capped at 2 CPUs and 8 GB RAM; may cause out-of-memory errors. Azure Web Apps Simple to deploy and scale. Works for lighter workloads. Supports deployment slots. Resource limits can bottleneck performance. Adding resources doesn’t always improve performance. Kubernetes Pods Production-grade control over resources, scaling, and isolation. Fine-grained control over resources, scaling, and isolation. Requires Kubernetes skills (that’s what customers are always telling us). Containers on Virtual Machines Highly flexible for legacy systems or custom configurations. Complete control over CPU, RAM, and disk. Higher cost and operational effort. Pricing Comparison Any deployment choice, in a cloud, has a cost. To help you evaluate the cost of each deployment option, the table below summarizes typical daily costs based on Azure pricing estimates. These values may vary depending on region, container size, and configuration. The estimates assume low, infrequent traffic, a few requests per day from your Agentic AI workflow to your deployed LLM. Deployment Option Estimated Daily Cost Details Azure Container Instances ~$3–$10/day Cost depends on CPU and memory allocation (e.g., 2 CPUs and 8 GB of memory). Azure Container Apps ~$5–$12/day Includes management costs, ingress, and scalability features. Azure Web Apps ~$8–$15/day Managed service costs include app hosting and container runtime fees. Kubernetes Pods ~$10–$20/day Costs vary based on cluster size, node configuration, and resource requirements. Containers on Virtual Machines ~$15–$25/day Includes VM hosting fees, container runtime costs, and storage costs for legacy systems. Observed Price Comparison I compiled my experiments for one week, by using one of our own Azure tenants. Here’s the actual cost, per day, for a phi-3-mini-4k LLM deployment in Azure Container Instances, Container Apps, Web Apps, and Kubernetes Service. Next to it I highlighted the response time, in seconds. Deployment Type Cost Components Estimated Cost Per Day (USD) Response Time (sec) Container Instances Compute Costs $5.60 48.31 Container Apps Base Pricing $3.22 45 App Service Plans (Web Apps) Premium v3 P1mv3 $4.32 90 Premium v3 P3mv3 $18.48 25 Premium v3 P4mv3 $35.52 15 Premium v3 P5mv3 $71.24 10 Kubernetes Deployment - Extra Node Compute Costs (Standard_D4as_v5, 4 vCPUs, 16GB) $14.40 (approx.) 42 Disk Costs $2.16 Total (Compute + Disk) $16.56 (approx.) 42 Compute Costs (Standard_D16as_v5, 16 vCPUs, 64GB) $57.60 (approx.) 43 Disk Costs $2.16 Total (Compute + Disk) $59.76 (approx.) 43 Findings: Container Instances: Listed first due to its simplicity and lower cost for isolated deployments: To deploy the small open-source LLMs, qwen-25-05-b and phi-3-mini-4k we found that 4 CPUs and 16 GB of memory are more appropriate. And these are small models. To deploy a larger open-source LLM, such as a LLaMA 2-7B (Large Language Model Meta AI), you may require 16–32 GB and 4–8 vCPUs and lots of disk space, 20 – 50 GB. You could deploy it on 4 vCPUs and 16 GB RAM but that means your model may be quite constrained and the response time will be quite high, if you get a response at all.Container Instances: Listed first due to its simplicity and lower cost for isolated deployments: Container Apps: Second, as it offers scalable, event-driven microservices at a competitive cost. The scalability feature is interesting, allowing the app to spin more pods for concurrent scoring requests. Built-in ingress (HTTPS endpoints) and scalability make this an excellent choice for lightweight open source model deployments. Web Apps: Listed next, reflecting managed hosting options with varying performance tiers. Choose configurations that allow at least 4 vCPUs and 16 GB RAM. We tried scaling up the App Service Plan gradually. As you can observe, throwing more resources at a model, doesn’t proportionally reduce the response time. There’s a fine balance between cost and performance. You can only find it by experimenting. Kubernetes Deployment: Last, as it is best suited for complex workflows requiring high scalability and orchestration. For the phi-3-mini-4k LLM we added a dedicated node and deployed the container in a pod. Scaling the node size didn’t seem to influence the response time. Perhaps other parameters such as disk type an IOPS should be fine-tuned. More work is needed. Discussion Some customers avoid proprietary LLMs from OpenAI, Google, or Azure because their data would leave their premises (or their cloud). They ask a way to use open-source, on-prem LLMs. After a month of testing, I’ve learned the contest isn’t equal: Proprietary cloud LLMs almost always win on cost, latency, and accuracy. Self-hosting shifts all compute costs to you, so each request costs more and takes longer. High latency limits daily throughput, pushing the per-request price even higher. That premium is the trade-off for keeping data inside your own walls. Summary The SAS Agentic AI Accelerator lets you deploy code-wrapped LLMs almost anywhere: Azure services, Kubernetes, or standalone VMs. Use the tables above to balance cost, performance, and operational effort. Stay tuned for Part 3, where we’ll dig into deployment scripts, scoring calls, and security tips. Acknowledgements Thanks to Mike Goddard (@MichaelGoddard) for guidance on SAS Container Runtime Kubernetes deployments. Additional Resources SAS Agentic AI Accelerator – Register and Publish Models. SAS Container Runtime – SAS Documentation. How to Publish a SAS Model to Azure with SCR: A Start-to-Finish Guide. Workshop Environment Agentic AI – How to with SAS Viya workshop now available on learn.sas.com to SAS Customers in the SAS Decisioning Learning Subscription and SAS Employees. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows. If you liked the post, give it a thumbs up! Please comment and tell us what you think about the AI Decisioning Assistant. For further guidance, reach out for assistance. Let us know how this solution works for you! Find more articles from SAS Global Enablement and Learning here.

Bogdan_Teleuca · ‎06-16-2025

The SAS Agentic AI Accelerator is designed to help businesses integrate Generative AI into their workflows efficiently. By enabling the registration of models, including proprietary and on-premises LLMs, it allows you to wrap these models in code. These code-wrapped models can later be deployed and used seamlessly in SAS Intelligent Decisioning, SAS Studio, and other applications, ensuring flexibility and governance. This post, the first in a series, explores how registration and publishing set the foundation for scalable agentic AI workflows. Overview The SAS Agentic AI Accelerator has been a key topic at SAS Innovate 2025, showcasing innovative ways to build agentic AI workflows. Developed by a team of SAS experts, this accelerator leverages SAS Viya products and capabilities to bring Generative AI into practical, governed use cases. If you missed the sessions, here’s a quick introduction to what the accelerator offers: Why an Accelerator? The SAS Agentic AI Accelerator can help companies adopt Generative AI in a structured and agile way, drastically reducing the time from prototype to production. Key benefits include: Enabling business users to create Generative AI use cases using low/no-code solutions. Integrating external large language models (LLMs) such as OpenAI’s GPT, Google’s Gemini, or open-source models from Hugging Face into workflows and agents. Combining non-deterministic LLMs with deterministic models created in SAS or Python. Governing and controlling LLM usage in workflows for secure deployment and effective monitoring. SAS Agentic AI Accelerator Components A foundational model repository in SAS Model Manager. A prompt builder user interface. A prompt model repository. Agentic AI workflows in SAS Intelligent Decisioning. Monitoring dashboards in SAS Visual Analytics. Scripts and custom steps to interact with LLMs. The SAS Agentic AI Accelerator is evolving rapidly, with components being added, updated, or removed as we speak. Building Agentic AI Workflows To create your own workflows using the SAS Agentic AI Accelerator, follow these steps: Set up your SAS Viya environment: Ensure your environment is configured for the accelerator. Register and publish models: Use scripts to register models in SAS Model Manager and publish them from here. Deploy and score models: Prepare models for execution in target environments. Build workflows: Design agentic AI workflows using SAS Intelligent Decisioning. Deploy workflows: Publish and score workflows for production use. Integrate into applications: Embed workflows into enterprise systems. Monitor usage: Keep track of workflow performance and effectiveness. 1. SAS Viya Environment For demonstration purposes, the SAS Viya Enterprise 2025.03 stable and LTS versions were used, deployed on Azure Cloud with configurations such as Python, Kaniko (for SAS Container Runtime), and Azure publishing destinations. While fine-tuning the environment for model publishing can be challenging, it’s entirely achievable. 2. Register and Publish Models The SAS Agentic AI Accelerator includes a code repository that simplifies model registration and publishing. Large Language Models (LLMs) wrapped in code can be registered as models in SAS Model Manager. Registration from the Git repository is facilitated by a script. Code Repositories SAS_LLM_UCF (owner: David Weik). The primary and most up-to-date repository. The repository is private, available only to SAS employees. If/when it is offered publicly, I will post an update. Agentic AI - How to with SAS Viya (owner: Education (GEL)). A subset of the original repository designed for educational workshops is available with the workshop. Code-Wrapped Large Language Models Code wrappers serve as deployment instructions for Large Language Models (LLMs). When wrapped in code, LLMs can be registered as models in SAS Model Manager. These wrappers standardize inputs and outputs, making it easier to integrate or replace models in workflows, regardless of their type or source. Why Code Wrappers Matter By standardizing inputs and outputs, code wrappers simplify the deployment process and ensure consistency and reusability across workflows. However, registered models cannot be scored directly in SAS Viya. They must first be deployed, typically in a container, such as a Docker environment, to enable execution and scoring. Governance Models registered in SAS Model Manager are governed, ensuring: Version control. Permission settings for access and usage. Tracking of model publishing destinations. Detailed documentation through model cards. Publishing Models When models are published to a container destination, such as Azure, the code wrappers are transformed into Docker images. These images are portable, allowing deployment across different cloud platforms or on-premises environments. Publishing also enables models to be used via REST APIs, which are essential for scalable integration into agentic AI workflows. REST APIs facilitate real-time communication between systems, ensuring seamless interaction with enterprise applications. While this post provides an overview of model publishing, we’ll dive deeper into the deployment process in a future article. Read SAS Agentic AI – Deploy and Score Models – The Big Picture where we explore how to deploy and score code-wrapped Large Language Models (LLMs) in Azure. Conclusion The SAS Agentic AI Accelerator simplifies the integration of Large Language Models into workflows through the use of code wrappers. These wrappers provide standardized inputs and outputs, allowing models to be registered, governed, and published as Docker images for deployment across various platforms. With the capabilities of SAS Model Manager, you can: Govern models effectively with version control and permissions. Publish models to containers for scalable, enterprise-grade applications. Acknowledgement Special thanks to: David Weik for invaluable insights and explanations. Xin Ru Lee (@XinRu) for sharing and assistance on numerous occasions. Workshop Environment Agentic AI – How to with SAS Viya workshop is now available on learn.sas.com to SAS Customers and SAS Employees. This workshop environment provides step-by-step guidance and a bookable environment for creating agentic AI workflows. For SAS Customers, the workshop is available in the SAS Decisioning Learning Subscription. Additional Resources SAS Agentic AI – Deploy and Score Models – The Big Picture. SAS Video Portal - From Idea to Production With LLMs and SAS Viya. If you liked the post, give it a thumbs up! Please comment and tell us what you think about the AI Decisioning Assistant. For further guidance, reach out for assistance. Let us know how this solution works for you! Find more articles from SAS Global Enablement and Learning here.

Bogdan_Teleuca · ‎04-13-2025

In a previous post, From Chat to Decision: Building an AI Assistant with SAS Intelligent Decisioning and Azure, we demonstrated how to combine conversational AI with decision-making logic to calculate customer risk scores. This post dives deeper into the mechanics of that solution, showing how SAS Intelligent Decisioning, Azure OpenAI, and Azure Logic Apps work together to deliver intelligent, automated decisions. Overview The AI assistant collects user inputs, such as demographic and financial data, processes them via an Azure Logic app, and sends a scoring request to the decision logic deployed in an Azure Container Instance (ACI). This decision logic calculates the risk score, which is then returned to the AI assistant in a user-friendly format. Solution Architecture Components Here’s how the components interact in this solution: AI Assistant in Azure AI Foundry: Collects user inputs via a conversational interface and generates a structured JSON payload that is sent to an Azure Logic app. Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page. Azure Logic app: Acts as middleware, handling the JSON payload from the AI assistant, sending it to the decision logic, running in a container, and formatting the response. SAS Intelligent Decisioning: The core business logic is built and tested here. It calculates the risk score based on the inputs provided. The decision is then published to Azure, where it becomes a Docker image. Azure Container Instance (ACI): Hosts the decision logic as a running Docker container, making it scalable and independent of a full SAS Viya environment. Key Features and Benefits This solution brings several innovations to the table: Deploying a SAS Intelligent Decisioning decision as a Docker image allows businesses to scale decision-making independently of SAS Viya environments. Azure OpenAI provides a powerful framework for building AI assistants that deliver actionable insights through natural language interactions. The solution takes advantage of a recent integration between Azure OpenAI and Azure Logic Apps. Azure Logic Apps enable smooth communication between the AI assistant, and the container instance where the decision logic is deployed. It provides low-code-no-code tools, without requiring custom code for complex workflows. Inputs for risk calculations are collected using conversations, formatted into JSON behind the scenes, results are converted to text, making easier for users to interact. Step-by-Step Guide to Building the Solution 1. Develop and Test the Decision Logic in SAS Intelligent Decisioning Start by building the decision logic that calculates the risk score. Notice the expected inputs and the returned outputs: 2. Create an Azure Publishing Destination To publish the decision logic for deployment: Create an Azure Container Registry (ACR) to store the decision logic as a Docker image. Configure the Azure publishing destination in SAS Intelligent Decisioning. For detailed steps, refer to How to Publish a SAS Model to Azure with SCR: A Start-to-Finish Guide. Publish the decision to the ACR. For example, it will appear as repository: risk_rating2_0:latest. 3. Deploy the Decision Logic to an Azure Container Instance Once the Docker image is in the ACR: Create an Azure Container Instance (ACI) and deploy the image. Note the container endpoint (e.g., http://CONT.LOCATION.azurecontainer.io:8080/risk_rating2_0) and the payload format required for scoring. To test the container, you can use the following curl command: curl --location --request POST "http://${CONT}.${LOCATION}.azurecontainer.io:8080/${IMAGE}" \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --data-raw '{"inputs" : [{"name": "age", "value" : 30}, {"name": "credit_score", "value" : 720}, {"name": "dti", "value": 40}, {"name": "employment_status", "value": "full-time"}, {"name": "income", "value" : 45000} ] }' 4. Integrate with Azure Logic Apps Azure Logic Apps bridge the gap between the assistant and the decision logic. Here's how to set it up: Trigger: HTTP Request Configure the Logic App to trigger on an HTTP request. The request body should adhere to a specific JSON schema, which includes user inputs such as age, employment status, income, credit score, and DTI. In Azure AI Foundry, when you attach to the AI assistant the Azure Logic App as a function: A JSON schema is generated: When the Logic app receives a JSON payload that adheres to that schema, the rest of the logic is triggered. Action: HTTP Request Send a POST request to the container endpoint using the JSON payload. Action: Parse JSON Response Use a predefined schema to parse the JSON response from the container. Action: Initialize Variable (Optional) The parsed outputs array from the JSON response is stored in a variable named risk_rating_outputs. This variable is of type array. Action: Create CSV Table (Optional) Extract the variables from the array and convert to a CSV format for readability. Action: Return Response Send the processed response back to the AI assistant. 5. Configure the AI Assistant in Azure OpenAI Foundry Create an Azure OpenAI resource in your Azure subscription. Deploy a large language model in Azure OpenAI. We used GPT-4o. Create an AI assistant with instructions tailored to the risk-rating task. Sample Instructions for the Assistant: You are an AI assistant that helps people calculate risk rating depending on inputs. Greet with: 'Hi, I can calculate your risk rating. For a precise calculation, I am going to need some inputs.' You will need some inputs for the calculation. Ask each person the following: 'What is your age?', 'What is your employment status? (choose only from this list: "unemployed", "part-time", "full-time")', 'What is your yearly income?', 'What is your current credit score?', 'What is your debt to income ratio or 'DTI'? Collect those values, keep track of the value you collected or not. Pass the collected values to the __ALA__sas-decisioning function. The function returns a csv table. Extract risk_rating (a string) from the raw csv table outputs. Provide a text with the calculated risk_rating. Provide additional data used for risk calculation, if asked. Answer with a message: 'Based on the inputs you provided, SAS Intelligent Decisioning calculated your risk rating: …' If the question is not related to a risk rating calculation say: "Please contact your Credit Officer. I do not have knowledge of what you are asking, or I am not authorized to respond." The assistant is programmed with instruction-based prompt engineering to ensure it gathers all required inputs systematically. It integrates with the Logic App via a function (e.g., __ALA__sas-decisioning) that passes the inputs to the container instance and processes the results. The instructions are an example of instruction-based prompt engineering, ensuring the assistant operates within a predefined domain and delivers accurate, task-specific responses while avoiding ambiguity or scope creep. It uses: Step-by-step task definition. Behavioral constraints. Structured inputs/outputs. Function integration: The assistant is programmed to interact with an external function (__ALA__sas-decisioning) and utilize its results for the task. Testing the Solution Before deploying the solution in production, test it end-to-end: Logic app testing: Use the following payload to test the Logic app: { "parameters": { "properties": { "age": 30, "employment_status": "full-time", "income": 45000, "credit_score": 720, "dti": 40 } } } Verify that the Logic app triggers correctly, sends the payload to the container instance, and returns a structured response: AI assistant testing: Interact with the Azure OpenAI assistant to ensure it collects inputs accurately, passes them to the Logic App, and displays the risk rating. Example Response: "Based on the inputs you provided, SAS Intelligent Decisioning calculated your risk rating: Low Risk." Ask details about the calculation: "Explain how you calculated my Low Risk. What is behind?" Conclusion This post has shown how to: Build an AI assistant that calculates risk ratings using conversational AI, SAS Intelligent Decisioning, and Azure resources, using a low-code, no-code approach. Use Azure Logic Apps to integrate API responses into workflows. By adopting a similar approach, businesses can deliver personalized, decision-driven insights at scale. Whether for financial risk assessment or other domains, this solution demonstrates how conversational AI can be combined with powerful decision-making tools to create intelligent, automated workflows. Additional Resources Video demonstrating the approach: From Chat to Decision: Building an AI Assistant with SAS Intelligent Decisioning and Azure. I was inspired by the following resources: Call Azure Logic Apps as Functions Using Azure OpenAI Assistants. Azure OpenAI Assistants API with Logic Apps. If you liked the post, give it a thumbs up! Please comment and tell us what you think about the AI Decisioning Assistant. For further guidance, reach out for assistance. Let us know how this solution works for you! Find more articles from SAS Global Enablement and Learning here.

Online Status	Offline
Date Last Visited	2 weeks ago

Re: Can GPT-5 Build SAS Studio Flows? A Hands-On Test Drive

Can GPT-5 Build SAS Studio Flows? A Hands-On Test Drive

Re: Go with the Job Flow in SAS Viya 3.5

Re: Go with the Job Flow in SAS Viya 3.5

Re: SAS Agentic AI – Deploy and Score Models – Kubernetes

SAS Agentic AI – Build Workflows in SAS Intelligent Decisioning

SAS Agentic AI – Deploy and Score Models – Kubernetes

SAS Agentic AI – Deploy and Score Models – Apps

Re: All about CORS and CSRF for developing web applications with the S...

Re: Configure Cross-Origin Resource Sharing for SAS Viya for REST API’...

AI Agents and Agentic AI: What’s the Difference?

A ModelContextProtocol Server(mcp) for Scoring with SAS Viya

All about CORS and CSRF for developing web applications with the SAS V...

Configure Cross-Origin Resource Sharing for SAS Viya for REST API’s an...

Using Azure OpenAI GPT Models in SAS Viya

SAS Agentic AI – Deploy and Score Models – Kubernetes

SAS Agentic AI – Deploy and Score Models – Containers

SAS Agentic AI – Build Workflows in SAS Intelligent Decisioning

SAS Agentic AI Accelerator – Register and Publish Models

SAS Agentic AI – Deploy and Score Models – Apps

Re: Can GPT-5 Build SAS Studio Flows? A Hands-On Test Drive

Can GPT-5 Build SAS Studio Flows? A Hands-On Test Drive

Re: Go with the Job Flow in SAS Viya 3.5

Re: Go with the Job Flow in SAS Viya 3.5

Re: SAS Agentic AI – Deploy and Score Models – Kubernetes

SAS Agentic AI – Build Workflows in SAS Intelligent Decisioning

SAS Agentic AI – Deploy and Score Models – Kubernetes

SAS Agentic AI – Deploy and Score Models – Apps

Re: All about CORS and CSRF for developing web applications with the S...

Re: Configure Cross-Origin Resource Sharing for SAS Viya for REST API’...

SAS Agentic AI – Deploy and Score Models – Containers

Re: How to Connect SAS Viya in Azure to On-Prem with VPN Gateways - Pa...

SAS Agentic AI – Deploy and Score Models – The Big Picture

SAS Agentic AI Accelerator – Register and Publish Models

From Chat to Decision: A Blueprint for Intelligent AI Assistants

SAS Global Forum 2020