SAS Studio flows are a staple for data engineers working with the SAS Viya platform. Enter generative AI, specifically GPT-5 from Azure OpenAI: can it create entire SAS Studio flows (.flw files) from prompts? And just how well does it understand the structure, logic, and metadata behind your flows?
In this post, I’ll walk you through a pair of hands-on demos that probe the limits of GPT-5’s code generation for SAS Studio flows. You’ll see where it shines, where it stumbles, and what this means for the future of SAS Studio flows generation.
The Challenge: Generating a Complete SAS Studio Flow
Let’s start with a fundamental question: Can generative AI build an entire SAS Studio flow, not just snippets or code fragments?
Over the past seven months, I experimented with generating SAS Studio custom steps using the GPT-4 model family. This worked well, as long as you included a few examples in the prompt.
I tried using the same model family to generate an entire SAS Studio flow with a few steps. The output was often truncated or missing key elements. Bottom line: I never got a SAS Studio flow that could be generated and run without errors.
A few months later, GPT-5 was released. I decided to try again, this time generating entire flows.
SAS Studio Flows Anatomy
SAS Studio flows are stored as .flw files, native file format for SAS Studio flows.
Technically, it’s a JSON (JavaScript Object Notation) document, specifically designed to describe the entire structure of a SAS Studio flow in a way that both humans and software can read and understand.
But this isn’t “flat” JSON. It’s highly nested. Think of it like a set of Russian dolls: at the top, you have the outer structure describing the overall flow, and inside you’ll find arrays and objects representing nodes, connections, properties, parameters, and even sub-flows. Each node (such as a data source, query, filter, join, or output) is represented as its own JSON object, sometimes with further nested children if, say, it’s a container for steps.
These are JSON documents that can easily stretch to thousands of lines. Defining each node, connection, parameter, and property by hand is a labor of love or, depending on your patience, a labor of frustration.
The Approach
To put GPT-5 to the test, I built a custom Python program that sends a prompt to Azure OpenAI’s GPT-5 and writes the response to a new .flw file. The prompt is made of:
The user message, which uses a one-shot, custom prompt engineering technique:
The one-shot component uses a complete .flw file, as an example.
The custom component includes clear input and output requirements and the relevant source table column metadata. Precise, tailored instructions are an extremely effective method for structured code/flow generation with large language models.
The system message, which is minimal and describes only the desired output rules.
Demo 1: Simple Flow Generation
We started with a simple ask: Generate a SAS Studio Flow that selects a few columns, applies a filter, and writes the result to a target table.
We injected the entire .flw JSON of a similar flow into the prompt, along with source table column metadata and a description of the desired output.
After a short wait, GPT-5 responded with a shiny new .flw file.
We loaded it into SAS Studio. The query node had the exact columns we wanted, the filter logic was correct, and best of all, the flow actually ran!
Magic? Not quite. Just the right model and the right prompt. But it sure feels like it.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
Demo 2: Auto-Joining Fact and Dimension Tables
Next, we cranked up the difficulty: build a flow that joins a fact table with a dimension table. We ask the model to figure out the join keys from the column metadata provided.
We want all columns from the fact, only unique columns from the dimension, and outputting to a new table.
Here’s what went into the prompt:
The .flw code of an example flow that joins two tables, including node structure, swim lanes, and query logic.
Clear column metadata for both source tables.
Instructions for the LLM to deduce the join keys and to keep only unique columns from the dimension.
After about 90 seconds (yes, this was a long prompt), GPT-5 generated a new .flw file.
When we opened the flow, here’s what we found:
The swim lane structure was intact, and both input tables were present.
The query node had the expected join, correctly matching customer_rk from both tables.
All columns from the fact table were included, as requested.
All columns from the dimension table were also present, including those duplicated in the fact table. Not perfect, not exactly what we asked for, but technically correct.
One minor hiccup: SAS Studio warned about duplicate columns after the join, but the data was accurate, and the flow executed successfully.
How It Works: Under the Hood
GPT-5 is more appropriate than the GPT-4 models for generating SAS Studio flows due to its significantly larger context window (up to 272,000 input tokens) and a very large output limit (up to 128,000 tokens).
Curious about the mechanics? Here’s a quick summary of the Python program powering these demos. You can find the full program at the end of the post:
It reads the system prompt (rules for the AI) and user prompt (the example .flw, requirements, and metadata) from text files.
It loads Azure OpenAI credentials from a .env file.
The Azure OpenAI client sends the combined prompt to GPT-5, with a high token limit and minimal verbosity for a focused, raw JSON output.
The response (new .flw JSON) is written directly to a .flw file, ready to be opened into SAS Studio.
No smoke, no mirrors, just good prompt engineering and the remarkable language modelling of GPT-5.
Lessons Learned: AI Learns Fast, But You’re Still the Boss
What do these demos reveal?
Generative AI can generate working, valid SAS Studio Flows from examples, metadata, and requirements.
The model deduced the correct join keys (customer_rk) from metadata alone, no handholding required.
While the model sometimes included extra columns (or failed to drop duplicates), the results were generally usable and easy to tweak.
Prompt engineering is critical. The more context you give, the better the results, especially if you provide sample .flw files and clear metadata.
Put simply: AI is your tireless assistant, not your replacement. Your review and common sense are required.
Critical Points
While this approach is reliable for generating safe, similar flows, its “narrowness” can become a bottleneck for innovation or advanced automation. Probable issues:
Overfitting: Using a detailed example can make the model mimic that flow, limiting creativity.
Limited Generalization: If only one flow type is used, the model struggles to handle more varied or complex flows.
Subset Bias: Generated flows may rely only on nodes present in the example, ignoring features not shown, even when required by instructions.
Instruction Drift: If instructions differ from the example, the model might default to the example’s pattern instead of adapting.
Positive Points
For many organizations, especially those focused on repeatability and governance, this method is practical enough, robust, and well-suited to real-world SAS Studio flows generation. It’s a great way to bootstrap reliable flow generation, especially if you incrementally expand your set of prompt examples as needs evolve. On the positive side:
High structural fidelity: Providing a real-world example ensures the generated .flw is almost always well-formed, schema-valid, and compatible with SAS Studio, reducing time lost to syntax or structural errors.
Standardization: When you want flows to follow a corporate standard or template, this “anchoring” guarantees consistency across generated artifacts (great for compliance and maintainability).
Prompt engineering simplicity: For new users or teams, the approach is easy to adopt, just swap in your example flow and metadata, and update your instructions. No deep LLM prompt tweaking required.
Reduced Ambiguity for LLMs: The detailed, example-driven prompt leaves little room for the model to “hallucinate” invalid node types, property names, or relationships. You get predictable, robust outputs.
Compatible with modern DevOps practices: By combining generative AI with Git, you can store each generated SAS Studio flow (.flw file) in version control. Trigger an automated job to test if the flow runs successfully. If it fails, simply rerun with a different prompt and generate a new candidate.
Conclusion
With the right setup and inputs (sample flows "injected" into the prompt, source table column metadata and clear instructions), GPT-5 can generate simple SAS Studio flows (simple, for now).
Quickstart: AI-Generated SAS Studio Flows
Ready to generate new flows? Here’s how to get started:
Prepare Your Inputs
Choose a sample flow at the right level of complexity.
Collect column metadata for your input tables. You can run:
%let mylib=sashelp;
%let mytable=prdsale;
/* Generate metadata for &mylib..&mytable and display key attributes */
proc contents data=&mylib..&mytable out=meta_temp noprint;
run;
proc sort data=meta_temp;
by varnum;
run;
data meta_report;
set meta_temp;
Obs = _N_;
keep Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT;
run;
proc print data=meta_report label noobs;
var Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT;
run;
That will create something like:
Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT
1 ACTUAL 1 8 Actual Sales DOLLAR
2 COUNTRY 2 15 Country $CHAR
3 DIVISION 2 15 Division $CHAR
4 MONTH 1 8 Month MONNAME
5 PREDICT 1 8 Predicted Sales DOLLAR
6 PRODTYPE 2 15 Product type $CHAR
7 PRODUCT 2 15 Product $CHAR
8 QUARTER 1 8 Quarter F
9 REGION 2 15 Region $CHAR
10 YEAR 1 8 Year F
Write clear output instructions (e.g., target table name and location, filters, columns to keep).
Instructions:
Create a new SAS Studio flow, modeled on the examples provided.
Flow name: Prdsale.flw
Input table: SASHELP.PRDSALE
Select columns: PRODUCT, ACTUAL
Filter: REGION = 'WEST' and COUNTRY = 'U.S.A.'
Output table: SASDM.PRDSALE_W
Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs).
Build Your Prompts
User Prompt
Create a user prompt (user_message file) by:
Briefly describing the sample flow.
Adding the raw .flw file.
Describing source input table metadata.
Providing your instructions.
Example (Demo 1):
Example flow:
Flow name: Car_Make_SimpleQuery.flw
Input table: SASHELP.CARS
Select columns: All + one new calculated column: Diff
New calculated column:
"name": "Diff",
"expressionText": "t1.MSRP - t1.Invoice"
Filter: t1.Origin = 'Asia'
Output table: SASDM.CARS_INFO
Full Car_Make_SimpleQuery.flw file:
--- {
"creationTimeStamp": null,
... flw_file_content_here ...
"stickyNotes": []
}
---
Column metadata input table.
SASHELP.PRDSALE:
---
Obs NAME TYPE LENGTH LABEL FORMAT INFORMAT
1 ACTUAL 1 8 Actual Sales DOLLAR
2 COUNTRY 2 15 Country $CHAR
3 DIVISION 2 15 Division $CHAR
4 MONTH 1 8 Month MONNAME
5 PREDICT 1 8 Predicted Sales DOLLAR
6 PRODTYPE 2 15 Product type $CHAR
7 PRODUCT 2 15 Product $CHAR
8 QUARTER 1 8 Quarter F
9 REGION 2 15 Region $CHAR
10 YEAR 1 8 Year F
---
Instructions:
Create a new SAS Studio flow, modeled on the examples provided.
Flow name: Prdsale.flw
Input table: SASHELP.PRDSALE
Select columns: PRODUCT, ACTUAL
Filter: REGION = 'WEST' and COUNTRY = 'U.S.A.'
Output table: SASDM.PRDSALE_W
Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs).
Insert the full sample .flw file content between the marker '---'. For example: --- { flw_file_content_here }---.
System Prompt
Create a system prompt (system_message file) with output rules (keep it minimal):
You are a SAS Studio Flows expert.
When given an example .flw JSON and task requirements, generate a new SAS Studio flow as a single, complete .flw JSON that mirrors the example’s schemaVersion, node types, structure, and property names. Change only what’s necessary to meet the task.
Use the minimum number of nodes required. Do not start a CAS session or upload to CAS unless explicitly requested.
Use only columns present in the provided metadata. If an essential detail is missing, ask one concise clarifying question and wait.
Output raw JSON only. No explanations, no markdown/code fences, no truncation. Ensure the JSON is well-formed (balanced braces, no trailing commas, consistent IDs).
Set Up the Environment
Create an Azure OpenAI resource. Deploy one GPT-5 model.
Set up a .env file with your API keys (and add .gitignore to keep secrets safe).
AZURE_OAI_ENDPOINT='https://myuser.openai.azure.com/'
AZURE_OAI_KEY='key_here' #Azure Open AI key here
AZURE_OAI_DEPLOYMENT='gpt-5'
AZURE_OAI_API_VERSION = '2025-01-01-preview'
Write Your Python Generation Program
Use your Python program to load prompts, call GPT-5, and save the generated .flw file.
# Import required libraries
import os
from dotenv import load_dotenv
from openai import AzureOpenAI
import time
# Change to the directory where the .env file is stored
# Replace this with the correct path if needed
# os.chdir('/gelcontent/llm')
# Prompt files
system_messages_file = 'system_message.txt'
user_messages_file = 'user_message.txt'
output_file = 'generated_with_gpt5.flw'
# Read LLM system message
with open(system_messages_file, 'r', encoding="utf8") as file:
system_message = file.read()
with open(user_messages_file, 'r', encoding="utf8") as file:
user_message = file.read()
# Load Azure OpenAI credentials from the .env file
load_dotenv()
endpoint = os.getenv("AZURE_OAI_ENDPOINT")
deployment = os.getenv("AZURE_OAI_DEPLOYMENT", "gpt-5")
subscription_key = os.getenv("AZURE_OAI_KEY")
api_version = os.getenv("AZURE_OAI_API_VERSION")
# Initialize Azure OpenAI client with key-based authentication
client = AzureOpenAI(
azure_endpoint=endpoint,
api_key=subscription_key,
api_version = api_version
)
# Prepare the chat prompt
messages=[
{"role": "user", "content": user_message},
{"role": "system", "content": system_message}
]
# Generate the completion
# Send the request to Azure OpenAI
try:
start_time = time.time() # Record the start time
completion = client.chat.completions.create(
model=deployment,
messages=messages,
max_completion_tokens=33999,
reasoning_effort="minimal",
#verbosity="low",
stop=None,
stream=False
)
end_time = time.time() # Record the end time
elapsed_time = end_time - start_time
# Print token usage
print("Prompt tokens:", completion.usage.prompt_tokens)
print("Completion tokens:", completion.usage.completion_tokens)
print("Total tokens:", completion.usage.total_tokens)
print(f"Time elapsed: {elapsed_time:.2f} seconds")
# Extract the generated step file content
flow_file_content = completion.choices[0].message.content
# Write the content to the output file
with open(output_file, 'w', encoding='utf-8') as f:
f.write(flow_file_content)
print(f"Flow file successfully written to: {output_file}")
except Exception as e:
print(f"An error occurred: {e}")
Automate & Iterate
Run your script, test the generated flow in SAS Studio, and tweak your prompts for better results.
That’s it! Version your flows in Git, automate testing, and let AI do the heavy lifting. Give it a try, and share what you discover!
Thanks for reading. If you liked this guide, give it a thumbs up!
Now go and generate something amazing!
For further guidance, reach out for assistance.
Find more articles from SAS Global Enablement and Learning here.
... View more