SAS Code Generation and Execution in SAS Viya with Azure OpenAI

4 Likes

Read the post to understand how you can create your own SAS code Generator-Executor, which uses Generative AI to interact with SAS Viya for data management tasks. The system takes prompts, converts them into SAS code, and executes this code in a user-defined SAS Viya environment. The results are then displayed, and users can ask follow-up questions or receive more information.

The Big Picture

This custom agent integrates multiple technologies, including SAS Viya, Azure OpenAI, and open-source packages like SASPY and LangChain.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

The Generator-Executor (GenEx) approach is innovative in that it leverages a large language model (LLM), GPT-4, from Azure OpenAI, to generate SAS code based on user prompts.

The generated SAS code is sent to an executor tool.

The executor tool uses SASPY to connect to a SAS Viya instance. The executor submits the SAS code generated by the model. The SAS code is executed in a SAS Studio compute context.

The response comes back within seconds, validating the code generated by the model.

I am excited about this approach, because of its simplicity. Only two custom tools are needed, the generator and the executor.

By simplifying code generation and execution, it can save time and resources, and make data management more accessible to non-technical users, reducing the need for manual coding and potentially speeding up data management tasks.

Video

Watch the following video, where we explore the approach:

(view in My Videos)

The approach is an evolved variation of the custom agent discussed in two earlier posts:

The main difference is that in the current approach, we let the LLM generate the SAS code. Whereas in the above posts, we had to create one custom tool for each operation. And that was unscalable. It didn't keep me up at night, but it preoccupied me. I asked myself if there's a better approach.

The spark to try something different was ignited after I read Understanding the Magic: Deconstructing Langchain’s SQL Agent. I was incredulous when Luc explained that the all-powerful SQL Database Toolkit only has four tools. I was expecting thousands of lines of code. Instead, I found four tools with well-crafted prompts and descriptions.

That somehow taught me a valuable lesson. In the era of LLMs, we have to think differently and program accordingly. We have to take advantage of the "plasticity" of the model by adjusting the prompts or the system messages (prompt engineering).

Python Program

The main Python program defines the LangChain custom agent that interacts with SAS Viya:

# LangChain custom SAS Viya agent with memory and only two tools
# Resource: https://python.langchain.com/docs/modules/agents/how_to/custom_agent
 
import os
import json
import urllib3
urllib3.disable_warnings

# Load the Large Language Model - from Azure OpenAI
from langchain_openai import AzureChatOpenAI

## Read the Azure OpenAI environment variables from a separate file
import config
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["AZURE_OPENAI_API_KEY"] = config.api_key
os.environ["AZURE_OPENAI_ENDPOINT"] = config.azure_openai_endpoint
os.environ["OPENAI_API_VERSION"] = config.api_version

## We'll use GPT-4 as deployed model
llm = AzureChatOpenAI(deployment_name="gpt-4", temperature=0, max_tokens=2000)

# Define custom tools
## SASPY - see https://communities.sas.com/t5/SAS-Communities-Library/Submit-workloads-to-SAS-Viya-from-Python/ta-p/807263

from langchain.agents import tool, Tool

@tool

def generate_sas_code(prompt: str) -> int:
    """Returns the generated SAS code, based on the user prompt.
    Useful when people ask to generate SAS code that can be executed in a SAS environment or session.
    The response is returned as pure SAS code, executable in a SAS session.
    That means any explanation is commented out in SAS like fashion /* comment */"""
    import os
    from openai import AzureOpenAI
    ## Setting Azure OpenAI environment variables
    import config
    os.environ["OPENAI_API_TYPE"] = "azure"
    os.environ["AZURE_OPENAI_API_KEY"] = config.api_key
    os.environ["AZURE_OPENAI_ENDPOINT"] = config.azure_openai_endpoint
    os.environ["OPENAI_API_VERSION"] = config.api_version
    # define client
    client = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    api_version="2023-12-01-preview"
    )
    #print("\nI will ask: ", prompt, '\n')
    response = client.chat.completions.create(
    model="gpt-4", # model = "deployment_name".
    messages=[
        {"role": "system", "content": """You are a SAS assistant that helps people write SAS code.
        The response you return has to be pure SAS code, executable in a SAS session.
        That means any explanation must be commented out in SAS like fashion. For example /* comment */ or * comment; .
        Explain your reasoning."""},
        {"role": "user", "content": "Return detailed table description for a SAS table CLASS in SASHELP library."},
        {"role": "assistant", "content": "PROC CONTENTS DATA=SASHELP.CLASS;run;quit;"},
        {"role": "user", "content": "Describe the SAS Data Set group from the health SAS library."},
        {"role": "assistant", "content": """"proc datasets library=health nolist; contents data=group (read=green) out=grpout;
            title  'The Contents of the GROUP Data Set';run;quit;"""},
        {"role": "user", "content": prompt},
            ]
        )
    sas_code = response.choices[0].message.content + '\n'
    return sas_code

def execute_sas_code(sas_code: str) -> str:
    """Useful when you need to execute generated SAS code.
        Input to this tool is correct SAS code, ready to be executed a SAS environment or session.
        Output comes in two parts:
        1. Results: the SAS code execution results comes at the top. The results are easy to identify. They can be found after the key word 'Results ---' .
        2. Log: The SAS code execution log comes after the results. The results are easy to identify. They can be found after the key word 'Log --- '.
        Identify the results, summarize and provide a short explanation.
        If the execution result is not correct, an error message will be returned by the SAS code execution log.
        Error is signaled by the keyword 'ERROR'.
        If an error is returned, you may need to rewrite the SAS code, using the generate SAS code tool.
        The execute the SAS code.
        Provide a Final Result: Summarize and provide a short explanation of what was done and what the outcome was.
        If you encounter an issue detail what the issue is."""
    # SASPY auth and start session
    import saspy
    sas = saspy.SASsession(cfgfile='sascfg_personal.py', cfgname = 'httpsviya')
    # Submit SAS code and parse results and the log
    sas_result = sas.submit(sas_code, results='TEXT')
    result = json.dumps(sas_result['LST'])
    log = json.dumps(sas_result['LOG'])
    return ('Results --- \n', " ".join(result.split()), '---\n', 'Log --- \n', " ".join(log.split()), '---\n')

# Describe the tools. The description is important for the LangChain chain.
tools = [
    Tool(
        name = "generate_sas_code",
        func = generate_sas_code,
        description = """
        Useful when people ask to generate SAS code that can be executed in a SAS environment
        or session.
        Input to this tool is the user prompt.
        Breakdown the user prompt.
        Explain your reasoning how you are going to construct the SAS code, from the user's prompt.
        Return the generated SAS code, based on the user's prompt or question.
        The response you return has to be pure SAS code, executable in a SAS session.
        That means any explanation must be commented out in SAS like fashion.
        For example /* comment */ or * comment ; .
        """
    ),
    Tool(
        name = "execute_sas_code",
        func = execute_sas_code,
        description = """Useful when you need to execute generated SAS code.
        Input to this tool is correct SAS code, ready to be executed.
        Output comes in two parts:
        1. Results: the SAS code execution results comes at the top. The results are easy to identify. They can be found between Results ---  and --- .
        2. Log: The SAS code execution log comes after the results. The results are easy to identify. They can be found between Log ---  and --- .
        Identify the results, summarize and provide a short explanation.
        Provide a Final Result: Summarize and provide a short explanation of what was done and what the outcome was.
        If you encounter an issue detail what the issue is."""
    )
        ]


# Create Prompt
## Now let us create the prompt. Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format.
## We will just have two input variables: input and agent_scratchpad. input should be a string containing the user objective.
## agent_scratchpad should be a sequence of messages that contains the previous agent tool invocations and the corresponding tool outputs.

# Add Memory

## Add a place for memory variables to go in the prompt
## Keep track of the chat history
## First, let’s add a place for memory in the prompt. We do this by adding a placeholder for messages with the key "chat_history". Notice that we put this ABOVE the new user input (to follow the conversation flow).

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a world class Data Engineer, who can write and execute SAS code.
            Please make sure you complete the objective above with the following rules:
            1/ Your job is to first breakdown the topic into relevant questions for understanding the topic in detail. You should have at max only 3 questions not more than that, in case there are more than 3 questions consider only the first 3 questions and ignore the others.
            2/ If asked to generate SAS code invoke the generate_sas_code tool.
            3/ If asked to run the generated code, invoke the execute_sas_code tool.
            Input to this tool is a detailed and correct SAS code, stripped of any plain text. If the query is not correct, an error message
            will be returned. If an error is returned, read the ERROR from the log. You may need to rewrite the SAS code, using the generate code tools,
            check the SAS code and execute the code again."""
        ),
        MessagesPlaceholder(variable_name=MEMORY_KEY),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

# Bind tools to LLM
## How does the agent know what tools it can use? In this case we’re relying on OpenAI function calling LLMs, which take functions as a separate argument and have been specifically trained to know when to invoke those functions.
## To pass in our tools to the agent, we just need to format them to the OpenAI function format and pass them to our model. (By bind-ing the functions, we’re making sure that they’re passed in each time the model is invoked.)

from langchain_community.tools.convert_to_openai import format_tool_to_openai_function # about to be deprecated

llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools])

# Create the Agent
## Putting those pieces together, we can now create the agent. We will import two last utility functions: a component for formatting intermediate steps (agent action, tool output pairs) to input messages that can be sent to the model, and a component for converting the output message into an agent action/agent finish.

from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser

## We can then set up a list to track the chat history

from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
        "chat_history": lambda x: x["chat_history"],
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# ASK QUESTIONS

print ('Start chatting with your data...\n')

# Continuously listens for text input to recognize and send as text to Azure OpenAI
def chat_with_open_ai():
    while True:
        print("SAS and Azure OpenAI are listening. Say 'Stop' or press Ctrl-Z to end the conversation.")
        try:
            q = input("Enter your question: \n")
            if q == "Stop.":
                print("Conversation ended.")
                break
            else:
                print("\nI will ask: ", q)
                result = agent_executor.invoke({"input": q, "chat_history": chat_history})
                chat_history.extend(
                    [
                        HumanMessage(content=q),
                        AIMessage(content=result["output"]),
                    ]
                )


        except EOFError:
            break

# Main

try:
    chat_with_open_ai()
except Exception as err:
    print("Encountered exception. {}".format(err))

I won't go into details and explain fully the Python program. There are a few comments inside. A similar program was explained in sufficient detail in How to Create Your Custom LangChain Agent for SAS Viya.

Conclusions

The SAS code Generator-Executor (GenEx) approach is innovative in that it leverages a large language model, GPT-4, to generate SAS code based on user prompts, reducing the need for manual coding and potentially speeding up data management tasks. The key feature of the approach is that it uses only two tools: a code generator, and a code executor to submit and test the generated SAS code in a SAS Viya environment of your choice.

Additional Resources

Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about having conversations with your data. If you wish to get more information, please write me an email.

Find more articles from SAS Global Enablement and Learning here.