In today’s data-driven world, organizations rely heavily on well-documented workflows for governance, compliance, and collaboration. However, manually documenting flows, such as those in SAS Studio, can be time-consuming and prone to human error. This is where automation steps in. By leveraging Azure OpenAI and a SAS Studio custom step, you can generate clear, detailed, and governance-focused documentation in seconds. Let’s explore how it works.
Watch the Demo Videos
Before diving into the details, watch this demo video showcasing how to document SAS Studio flows using a SAS Studio custom step calling an Azure OpenAI large language model.
This custom step allows you to document not only SAS Studio Flows, but also SAS and Jupyter notebooks, SAS code files and potentially other files containing code.
Required Components
To replicate this example, you’ll need the following:
SAS Viya with Python
Ensure you have a working SAS Viya environment and Python installed and configured for the environment.
The custom step uses a Python program.
Python packages required: ‘python-dotenv’ and ‘requests’.
Azure OpenAI Resource Create an Azure OpenAI resource and deploy a model (e.g., GPT-4o).
.env File Create a configuration file named .env with the following content:
AZURE_OAI_ENDPOINT='https://my_endpoint.openai.azure.com/' # change my_endpoint
AZURE_OAI_KEY='my_api_key' # change my_api_key
AZURE_OAI_DEPLOYMENT='gpt-4o'
Display Hidden Files .env is a hidden file. To view it in SAS Studio, from Options > Preferences > General > check Display hidden files.
System Message File The system_message.txt file shapes the documentation by acting as a guide for the model, defining the structure, level of detail, and focus areas (e.g., summaries, transformations, compliance) to ensure the output aligns with specific requirements. Create a file named system_message.txt with the following content:
You are an AI assistant specialized in documenting SAS Studio flows (flw files) for Governance and Compliance purposes.
Your task is to analyze a SAS Studio flow, including its visual representation (image) and the underlying code, to generate detailed and precise documentation.
Follow these steps:
Summary: Start with a high-level summary of the SAS Studio flow. Include:
The purpose of the flow.
Key inputs (datasets or files used).
Key outputs (datasets or files generated).
A brief description of the transformations or processes applied.
Step-by-Step Explanation: Break down the flow into individual steps and explain:
The purpose of each step.
Inputs and outputs for the step.
Any transformations, joins, filters, or aggregations applied.
Detailed Column Mapping Table: Create a table titled "Detailed Column Mapping for Each Step in the Flow". For each step:
List all columns involved.
Specify their names before and after the step.
Highlight any changes applied to the columns (e.g., renaming, transformations, additions, deletions).
Use the following table format:
Step Column Name Changes (e.g., renamed, transformed, added, deleted) Description of Change (if applicable)
Step Name/ID Column_Name_1 Renamed to New_Column_Name_1 Column renamed for consistency
Step Name/ID Column_Name_2 Transformed Applied log transformation
Step Name/ID Column_Name_3 Deleted Column removed as it is no longer needed
Governance and Compliance Notes: Add a section at the end to highlight:
Any potential compliance concerns (e.g., PII data transformations, data lineage issues).
Suggestions for improving documentation or flow design for better governance.
Steps to Document
Step 1: Open SAS Studio
Launch SAS Studio and navigate to the Develop Code and Flows section.
Step 2: Create the Custom Step
In SAS Studio, create a custom step and paste in the Prompt UI and Program the content provided at the end of this post. Save it as LLM - Document Flows with Azure OpenAI.step.
Step 3: Create a Flow Using the Custom Step
From the Control Library, drag and drop the custom step titled LLM - Document Flows with Azure OpenAI.step into your flow canvas.
Step 4: Configure the Inputs
Fill in the required fields in the custom step configuration panel:
Select the flow to be documented (.flw file): Choose the .flw file you want to document (e.g., Car_Make_with_SubFlows.flw).
Choose the folder where the .env file is stored: Provide the path to the folder containing your .env file (e.g., /azuredm/code).
Pick the file where the LLM system message is stored: Provide the path to the system_message.txt file.
Specify the output file name: The file where the documentation will be saved (e.g., Car_Make_with_SubFlows.txt).
Step 5: Run the Flow Using the Custom Step
Click Run to execute the custom step. The Azure OpenAI model will analyze the SAS Studio flow and generate detailed documentation based on the provided system message.
What ensures the accuracy of the documentation is that the entire .flw file is read and included as a single long string in the model prompt. This method allows the Azure OpenAI model to access the complete flow structure, dependencies, and logic, ensuring that no detail is overlooked. By providing the model with the full context, the documentation reflects the flow's exact operations, making it both precise and comprehensive.
Step 6: Review the Output
Locate the output file (e.g., Car_Make_with_SubFlows.txt) in the specified folder. Open the file to review the generated documentation, which includes:
High-Level Summary: A concise overview of the SAS Studio flow, including its purpose, key inputs, outputs, and transformations.
Step-by-Step Explanation: A breakdown of each step in the flow, including its purpose, inputs, outputs, and transformations.
Detailed Column Mapping Table: A table that documents changes to columns during the flow, such as renaming, transformations, or deletions.
Governance and Compliance Notes: Highlights compliance concerns, data lineage, and suggestions for improvement.
The output is readable markdown, which can be opened with Visual Studio Code:
Custom Step Features
The custom step simplifies the documentation process with the following features:
Automated Analysis: Leverages Azure OpenAI to analyze flow components and generate detailed documentation.
Governance Focus: Highlights compliance concerns, metadata, data lineage, and PII handling recommendations.
Customizable System Message: Modify the system_message.txt file to tailor the documentation style and content.
Conclusion
In summary, this custom step provides a quick and efficient way to document SAS Studio Flows, notebooks, and other workflows. It brings clarity to complex or opaque processes, making it easier to understand transformations, data lineage, and compliance considerations. With a few adjustments, you can customize the output to suit your specific needs.
Custom Step Code
Prompt UI
{
"showPageContentOnly": true,
"pages": [
{
"id": "flowDoc",
"type": "page",
"label": "Document SAS Studio Flows with Azure OpenAI",
"children": [
{
"id": "text1",
"type": "text",
"text": "Inputs: Choose the flow to be documented (.flw file), the connection configuration for your Azure OpenAI model (.env file), the model system message, which must be stored in a 'system_message.txt' file.",
"visible": ""
},
{
"id": "input_file",
"type": "path",
"label": "Select the flow to be documented (.flw file):",
"pathtype": "file",
"placeholder": "Car_Make_with_SubFlows.flw",
"required": false,
"visible": ""
},
{
"id": "env_file_folder",
"type": "path",
"label": "Folder where the .env file is stored",
"pathtype": "folder",
"placeholder": "/azuredm/code",
"required": false,
"visible": ""
},
{
"id": "messages",
"type": "path",
"label": "File where the LLM system message is stored",
"pathtype": "file",
"placeholder": "system_message.txt",
"required": false,
"visible": ""
},
{
"id": "text2",
"type": "text",
"text": "Output: choose the output file where the flow documentation is written (.txt file).",
"visible": "",
"indent": 0
},
{
"id": "output_file",
"type": "path",
"label": "Write the output to (.txt file):",
"pathtype": "file",
"placeholder": "Car_Make_with_SubFlows_.txt",
"required": false,
"visible": ""
}
]
},
{
"id": "help1",
"type": "page",
"label": "How To Use the Custom Step",
"children": [
{
"id": "text3",
"type": "text",
"text": "Custom Step: Document SAS Studio Flow\n\nPurpose:\nThis custom step processes a selected SAS Studio .flw file, sends its content to an Azure OpenAI endpoint, and generates documentation for the flow. The output is saved in a .txt file.\n\nInputs:\n1. Flow File (.flw): The path to the SAS Studio flow file to be documented.\n2. Environment File Folder (.env): The folder containing the .env file for Azure OpenAI configuration.\n3. System Message File (system_message.txt): A .txt file containing the system message for the OpenAI API.\n\nOutput:\n1. The .txt file where the documentation will be saved. The file contains Markdown-formatted text.\n\nHow It Works:\n1. The .flw file content is read and prepared for processing.\n2. A system message is read from the messages file.\n3. The .env file is loaded to retrieve Azure OpenAI credentials and endpoint details.\n4. The .flw file content and system message are sent to the Azure OpenAI API for processing.\n5. The API response, which contains the generated documentation, is saved to the specified .txt output file.\n\nSteps:\n1. Choose the inputs.\n2. Specify the .txt file to save the documentation.\n3. Run the step to generate the documentation.\n\n--- system_message.txt sample file --- \nYou are an AI assistant specialized in documenting SAS Studio flows (flw files) for Governance and Compliance purposes. Your task is to analyze a SAS Studio flow, including its visual representation (image) and the underlying code, to generate detailed and precise documentation. Follow these steps:\n\nSummary: Start with a high-level summary of the SAS Studio flow. Include:\nThe purpose of the flow.\nKey inputs (datasets or files used).\nKey outputs (datasets or files generated).\nA brief description of the transformations or processes applied.\nStep-by-Step Explanation: Break down the flow into individual steps and explain:\nThe purpose of each step.\nInputs and outputs for the step.\nAny transformations, joins, filters, or aggregations applied.\nDetailed Column Mapping Table: Create a table titled \"Detailed Column Mapping for Each Step in the Flow\". For each step:\nList all columns involved.\nSpecify their names before and after the step.\nHighlight any changes applied to the columns (e.g., renaming, transformations, additions, deletions).\nUse the following table format:\nStep\tColumn Name\tChanges (e.g., renamed, transformed, added, deleted)\tDescription of Change (if applicable)\nStep Name/ID\tColumn_Name_1\tRenamed to New_Column_Name_1\tColumn renamed for consistency\nStep Name/ID\tColumn_Name_2\tTransformed\tApplied log transformation\nStep Name/ID\tColumn_Name_3\tDeleted\tColumn removed as it is no longer needed\n\nGovernance and Compliance Notes: Add a section at the end to highlight:\nAny potential compliance concerns (e.g., PII data transformations, data lineage issues).\nSuggestions for improving documentation or flow design for better governance.",
"visible": ""
}
]
}
],
"values": {
"input_file": "",
"env_file_folder": "",
"messages": "",
"output_file": ""
}
}
Program
/* Run the Python code within PROC PYTHON */
proc python;
submit;
# The following contains the Python Code to be written inside a PROC PYTHON.
import os
from dotenv import load_dotenv
import requests
# Get variables from SAS
env_file_folder = SAS.symget('env_file_folder')
input_file = SAS.symget('input_file')
output_file = SAS.symget('output_file')
messages = SAS.symget('messages')
# Extract from SAS variables to resolve to Python paths
env_file_folder = env_file_folder.replace('sasserver:', '')
input_file = input_file.replace('sasserver:', '')
output_file = output_file.replace('sasserver:', '')
messages = messages.replace('sasserver:', '')
print("env_file_folder:", env_file_folder)
print("input_file:", input_file)
print("output_file:", output_file)
print("messages:", messages)
# Folder where .env file is stored
os.chdir(env_file_folder)
def process_file(input_file, output_file):
try:
# Read the input file
with open(input_file, 'r', encoding='utf-8') as f:
content = f.read()
# Print the length of the string to verify
print(f"The length of the SAS Studio flow as a string is: {len(content)}")
# Read LLM system message
with open(messages, 'r', encoding="utf8") as file:
system_message = file.read()
# Get configuration settings
load_dotenv()
azure_oai_endpoint = os.getenv("AZURE_OAI_ENDPOINT")
azure_oai_key = os.getenv("AZURE_OAI_KEY")
azure_oai_deployment = os.getenv("AZURE_OAI_DEPLOYMENT")
azure_oai_model = azure_oai_deployment
api_version = '2024-05-01-preview' # this might change in the future
# Request Header
headers = {
"Content-Type": "application/json",
"api-key": azure_oai_key,
}
# Payload for the request
payload = {
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": f"{system_message}\n"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": f"Document the following SAS Studio flow. FLW file content: --- {content} ---"
}
]
},
],
"temperature": 0.5,
"top_p": 0.9,
"max_tokens": 2500
}
ENDPOINT = f"{azure_oai_endpoint}openai/deployments/{azure_oai_model}/chat/completions?api-version={api_version}"
# Send the request
try:
response = requests.post(ENDPOINT, headers=headers, json=payload)
response.raise_for_status() # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
except requests.RequestException as e:
raise SystemExit(f"Failed to make the request. Error: {e}")
# Handle the response as needed (e.g., print or process)
response_data = response.json()
# Extract the text content
# This will vary based on the APIs JSON structure
text_content = response_data['choices'][0]['message']['content']
print("\n Response: \n" + text_content + "\n")
# Write the response to a file
with open(output_file, mode="w", encoding="utf8") as results_file:
results_file.write(text_content)
print(f"\nResponse written to {output_file}\n")
except Exception as e:
error_message = f"Error: {e}"
print(error_message)
# Pass the error message back to SAS log
SAS.submit(f'data _null_; put "{error_message}"; run;')
# Run the processing function
try:
process_file(input_file, output_file)
except Exception as e:
error_message = f"Error: {e}"
print(error_message)
# Pass the error message back to SAS log
SAS.submit(f'data _null_; put "{error_message}"; run;')
endsubmit;
run;
Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about the Visual Studio Code Extension. If you wish to get more information, please write me an email.
Find more articles from SAS Global Enablement and Learning here.
... View more