SAS Studio flows are a staple for data engineers working with the SAS Viya platform. Enter generative AI, specifically GPT-5 from Azure OpenAI: can it create entire SAS Studio flows (.flw files) from prompts? And just how well does it understand the structure, logic, and metadata behind your flows?
In this post, I’ll walk you through a pair of hands-on demos that probe the limits of GPT-5’s code generation for SAS Studio flows. You’ll see where it shines, where it stumbles, and what this means for the future of SAS Studio flows generation.
Let’s start with a fundamental question: Can generative AI build an entire SAS Studio flow, not just snippets or code fragments?
Over the past seven months, I experimented with generating SAS Studio custom steps using the GPT-4 model family. This worked well, as long as you included a few examples in the prompt.
I tried using the same model family to generate an entire SAS Studio flow with a few steps. The output was often truncated or missing key elements. Bottom line: I never got a SAS Studio flow that could be generated and run without errors.
A few months later, GPT-5 was released. I decided to try again, this time generating entire flows.
SAS Studio flows are stored as .flw files, native file format for SAS Studio flows.
Technically, it’s a JSON (JavaScript Object Notation) document, specifically designed to describe the entire structure of a SAS Studio flow in a way that both humans and software can read and understand.
But this isn’t “flat” JSON. It’s highly nested. Think of it like a set of Russian dolls: at the top, you have the outer structure describing the overall flow, and inside you’ll find arrays and objects representing nodes, connections, properties, parameters, and even sub-flows. Each node (such as a data source, query, filter, join, or output) is represented as its own JSON object, sometimes with further nested children if, say, it’s a container for steps.
These are JSON documents that can easily stretch to thousands of lines. Defining each node, connection, parameter, and property by hand is a labor of love or, depending on your patience, a labor of frustration.
To put GPT-5 to the test, I built a custom Python program that sends a prompt to Azure OpenAI’s GPT-5 and writes the response to a new .flw file. The prompt is made of:
We started with a simple ask: Generate a SAS Studio Flow that selects a few columns, applies a filter, and writes the result to a target table.
Magic? Not quite. Just the right model and the right prompt. But it sure feels like it.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Next, we cranked up the difficulty: build a flow that joins a fact table with a dimension table. We ask the model to figure out the join keys from the column metadata provided.
We want all columns from the fact, only unique columns from the dimension, and outputting to a new table.
Here’s what went into the prompt:
After about 90 seconds (yes, this was a long prompt), GPT-5 generated a new .flw file.
When we opened the flow, here’s what we found:
One minor hiccup: SAS Studio warned about duplicate columns after the join, but the data was accurate, and the flow executed successfully.
GPT-5 is more appropriate than the GPT-4 models for generating SAS Studio flows due to its significantly larger context window (up to 272,000 input tokens) and a very large output limit (up to 128,000 tokens).
Curious about the mechanics? Here’s a quick summary of the Python program powering these demos. You can find the full program at the end of the post:
No smoke, no mirrors, just good prompt engineering and the remarkable language modelling of GPT-5.
What do these demos reveal?
Put simply: AI is your tireless assistant, not your replacement. Your review and common sense are required.
While this approach is reliable for generating safe, similar flows, its “narrowness” can become a bottleneck for innovation or advanced automation. Probable issues:
For many organizations, especially those focused on repeatability and governance, this method is practical enough, robust, and well-suited to real-world SAS Studio flows generation. It’s a great way to bootstrap reliable flow generation, especially if you incrementally expand your set of prompt examples as needs evolve. On the positive side:
With the right setup and inputs (sample flows "injected" into the prompt, source table column metadata and clear instructions), GPT-5 can generate simple SAS Studio flows (simple, for now).
Ready to generate new flows? Here’s how to get started:
%let mylib=sashelp;
%let mytable=prdsale;
/* Generate metadata for &mylib..&mytable and display key attributes */
proc contents data=&mylib..&mytable out=meta_temp noprint;
run;
proc sort data=meta_temp;
by varnum;
run;
data meta_report;
set meta_temp;
Obs = _N_;
keep Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT;
run;
proc print data=meta_report label noobs;
var Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT;
run;
That will create something like:
Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT
1 ACTUAL 1 8 Actual Sales DOLLAR
2 COUNTRY 2 15 Country $CHAR
3 DIVISION 2 15 Division $CHAR
4 MONTH 1 8 Month MONNAME
5 PREDICT 1 8 Predicted Sales DOLLAR
6 PRODTYPE 2 15 Product type $CHAR
7 PRODUCT 2 15 Product $CHAR
8 QUARTER 1 8 Quarter F
9 REGION 2 15 Region $CHAR
10 YEAR 1 8 Year F
Instructions:
Create a new SAS Studio flow, modeled on the examples provided.
Flow name: Prdsale.flw
Input table: SASHELP.PRDSALE
Select columns: PRODUCT, ACTUAL
Filter: REGION = 'WEST' and COUNTRY = 'U.S.A.'
Output table: SASDM.PRDSALE_W
Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs).
Create a user prompt (user_message file) by:
Example (Demo 1):
Example flow:
Flow name: Car_Make_SimpleQuery.flw
Input table: SASHELP.CARS
Select columns: All + one new calculated column: Diff
New calculated column:
"name": "Diff",
"expressionText": "t1.MSRP - t1.Invoice"
Filter: t1.Origin = 'Asia'
Output table: SASDM.CARS_INFO
Full Car_Make_SimpleQuery.flw file:
--- {
"creationTimeStamp": null,
... flw_file_content_here ...
"stickyNotes": []
}
---
Column metadata input table.
SASHELP.PRDSALE:
---
Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT
1 ACTUAL 1 8 Actual Sales DOLLAR
2 COUNTRY 2 15 Country $CHAR
3 DIVISION 2 15 Division $CHAR
4 MONTH 1 8 Month MONNAME
5 PREDICT 1 8 Predicted Sales DOLLAR
6 PRODTYPE 2 15 Product type $CHAR
7 PRODUCT 2 15 Product $CHAR
8 QUARTER 1 8 Quarter F
9 REGION 2 15 Region $CHAR
10 YEAR 1 8 Year F
---
Instructions:
Create a new SAS Studio flow, modeled on the examples provided.
Flow name: Prdsale.flw
Input table: SASHELP.PRDSALE
Select columns: PRODUCT, ACTUAL
Filter: REGION = 'WEST' and COUNTRY = 'U.S.A.'
Output table: SASDM.PRDSALE_W
Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs).
Insert the full sample .flw file content between the marker '---'. For example: --- { flw_file_content_here }---.
Create a system prompt (system_message file) with output rules (keep it minimal):
You are a SAS Studio Flows expert. When given an example .flw JSON and task requirements, generate a new SAS Studio flow as a single, complete .flw JSON that mirrors the example’s schemaVersion, node types, structure, and property names. Change only what’s necessary to meet the task. Use the minimum number of nodes required. Do not start a CAS session or upload to CAS unless explicitly requested. Use only columns present in the provided metadata. If an essential detail is missing, ask one concise clarifying question and wait. Output raw JSON only. No explanations, no markdown/code fences, no truncation. Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs).
Create an Azure OpenAI resource. Deploy one GPT-5 model.
Set up a .env file with your API keys (and add .gitignore to keep secrets safe).
AZURE_OAI_ENDPOINT='https://myuser.openai.azure.com/'
AZURE_OAI_KEY='key_here' #Azure Open AI key here
AZURE_OAI_DEPLOYMENT='gpt-5'
AZURE_OAI_API_VERSION = '2025-01-01-preview'
Use your Python program to load prompts, call GPT-5, and save the generated .flw file.
# Import required libraries
import os
from dotenv import load_dotenv
from openai import AzureOpenAI
import time
# Change to the directory where the .env file is stored
# Replace this with the correct path if needed
# os.chdir('/gelcontent/llm')
# Prompt files
system_messages_file = 'system_message.txt'
user_messages_file = 'user_message.txt'
output_file = 'generated_with_gpt5.flw'
# Read LLM system message
with open(system_messages_file, 'r', encoding="utf8") as file:
system_message = file.read()
with open(user_messages_file, 'r', encoding="utf8") as file:
user_message = file.read()
# Load Azure OpenAI credentials from the .env file
load_dotenv()
endpoint = os.getenv("AZURE_OAI_ENDPOINT")
deployment = os.getenv("AZURE_OAI_DEPLOYMENT", "gpt-5")
subscription_key = os.getenv("AZURE_OAI_KEY")
api_version = os.getenv("AZURE_OAI_API_VERSION")
# Initialize Azure OpenAI client with key-based authentication
client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=subscription_key,
api_version = api_version
)
# Prepare the chat prompt
messages=[
{"role": "user", "content": user_message},
{"role": "system", "content": system_message}
]
# Generate the completion
# Send the request to Azure OpenAI
try:
start_time = time.time() # Record the start time
completion = client.chat.completions.create(
model=deployment,
messages=messages,
max_completion_tokens=33999,
reasoning_effort="minimal",
#verbosity="low",
stop=None,
stream=False
)
end_time = time.time() # Record the end time
elapsed_time = end_time - start_time
# Print token usage
print("Prompt tokens:", completion.usage.prompt_tokens)
print("Completion tokens:", completion.usage.completion_tokens)
print("Total tokens:", completion.usage.total_tokens)
print(f"Time elapsed: {elapsed_time:.2f} seconds")
# Extract the generated step file content
flow_file_content = completion.choices[0].message.content
# Write the content to the output file
with open(output_file, 'w', encoding='utf-8') as f:
f.write(flow_file_content)
print(f"Flow file successfully written to: {output_file}")
except Exception as e:
print(f"An error occurred: {e}")
Run your script, test the generated flow in SAS Studio, and tweak your prompts for better results.
That’s it! Version your flows in Git, automate testing, and let AI do the heavy lifting. Give it a try, and share what you discover!
Thanks for reading. If you liked this guide, give it a thumbs up!
Now go and generate something amazing!
For further guidance, reach out for assistance.
Find more articles from SAS Global Enablement and Learning here.
Awesome😍
It now takes me 15 minutes to provide all the context details for the prompt + additional time for validating the AI result for a simple task that i could have clicked together in the UI in 2 minutes.
@AndreasMenrath You're welcome. Not to mention you will have costs for your LLM... 😁 The point of the post is the prompt and the prompting technique. And the post definitely not intended for users who want to click. It's intended for automatic processes that can use code to replicate a SAS Studio flow. Obviously, I failed to explain that. I should do a better job.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.