Watch this Ask the Expert session to learn how Large Language Models (LLMs) - ChatGPT, Claude and DeepSeek, for example - can be seamlessly integrated into the SAS 9.4 environment.
You will learn:
The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.
Q&A
Is it possible to get the prompts in a readable format?
Yes. The prompts will be provided alongside the code
Can you use %llmMacro with SAS GRID 9.4 on-prem?
Yes, you can use your LLM macro with the SAS grid. It has been tested with SAS Viya as well.
If you're using API integration with ChatGPT, what are the data privacy protections while performing analysis on PII data or proprietary data? How does secure API key help with this?
When using OpenAI's ChatGPT API, data privacy protections have improved significantly since March 2023, when OpenAI changed their policy to not use API data for model training. However, they may still retain data for up to 30 days for safety monitoring purposes before deletion. This represents a key difference from the web interface, where data usage policies are more complex and varied. The API key itself primarily serves as an authentication mechanism and doesn't inherently provide enhanced privacy protections - rather, the privacy benefits come from the API's specific data retention and usage policies, not from the secure key.
It's crucial to understand that whilst these no-training policies offer some protection, sending PII or proprietary data to any third-party API still creates inherent privacy risks. Your sensitive information travels over the internet and gets processed on external servers, regardless of whether it's used for training purposes. Organisations must carefully evaluate whether their internal data governance policies permit transmitting sensitive information to external APIs, even with favourable data handling commitments. The secure API key manages access and usage tracking, but it doesn't eliminate the fundamental privacy exposure that occurs when confidential data leaves your organisation's control and enters external systems.
What are needed prior to using this macro? What are the requirements for interacting with CHATGPT for example?
To use OpenAI's ChatGPT API, you need an OpenAI account and an API key, which you obtain from OpenAI's developer platform. The API uses pay-as-you-go pricing based on token usage rather than fixed subscriptions. You'll need to add credit to your account to cover API calls, with costs varying depending on which model you use and how many tokens you consume. New accounts may require a minimum payment to begin using the API. For business or enterprise use, OpenAI offers different pricing tiers and support options. The API key serves as your authentication method and allows you to integrate ChatGPT's capabilities for this macro.
What is the best way to include a data table or reference document in a prompt? Can the macro include uploading a separate data table, PDF, etc.? Thanks!
Currently, the %llm macro doesn't support uploading separate data tables, PDFs, or other documents directly to the LLM APIs, but there are several potential workarounds and future solutions available.
Chris Hemedinger presented an "Ask the Expert" webinar on using SAS with Microsoft 365 (OneDrive, Teams, and SharePoint). You can integrate and have access to your SharePoint, which could contain vital documents like Statistical Analysis Plans. This integration could enhance the context accuracy of the LLM response by providing access to your document repositories within your SAS environment.
There are also model context protocols (MCPs) which are still nascent but have huge potential for this type of functionality. However, currently the %llm macro doesn't have that direct document upload functionality yet.
Can a company host an LLM themselves and thereby ensure that the prompts and answers stay within the company so the prompt can contain private/subject information?
Yes, companies can host LLMs internally to keep all data within their infrastructure, ensuring complete privacy control. However, this requires significant upfront investment in GPU hardware, ongoing maintenance costs, and technical expertise. Self-hosted models may not perform as well as commercial APIs like ChatGPT or Claude, though open-source options are improving. Whilst this approach eliminates external data privacy risks, the substantial costs and complexity mean it's primarily viable for organisations with high usage volumes or strict data governance requirements.
Please let us know how to download the %llm macro code. Apologies if i missed this. I remember seeing Choice or Choose, might not be a macro, but was part of the output selection. Sorry for the confusion.
When the macro receives a JSON response from the LLM API, SAS automatically parses it using libname response JSON fileref=resp, which creates multiple datasets based on the JSON structure. For ChatGPT and Perplexity, the response contains nested objects like {"choices": [{"message": {"content": "response text"}}]}, so SAS automatically creates a dataset called response.choices_message containing the message data, from which the macro extracts the content field. Claude uses a different JSON structure with {"content": [{"text": "response text"}]}, so the macro accesses response.content and extracts the text field instead. The macro doesn't need to know these dataset names beforehand - SAS's JSON engine automatically generates dataset names that mirror the API's specific JSON hierarchy, and the macro simply uses the appropriate path for each provider.
With this new approach, what is your view for the potential productivity gains that the organization will experience?
As I mentioned in the presentation, you avoid context switching. An organisation should experience significant productivity improvements through reduced context switching and accelerated onboarding of junior programmers. However, the greatest benefits will come to experienced developers who maintain their core coding skills whilst leveraging LLMs as enhancement tools. LLMs excel at providing scaffolding and conceptual frameworks, but human expertise remains essential for implementation and quality assurance. The key is viewing LLMs as productivity amplifiers rather than replacements. Developers should continue learning fundamental skills as this knowledge enables them to better evaluate and refine LLM-generated code.
Will your code be available somewhere? // Is it planned to have the macro code to be shared, also the corresponding documentation?
You can find my code attached to the Post.
I think instead of separating each LLM, we can parametrize URL and use it for any LLM. Not just these 3.
Agreed this could be developed further and I kept the example LLMs separate for educational purposes during the presentation to highlight subtle implementation differences.
Yes, so I was simply parsing the response. Got it.
Correct!
While doing queries using diverse models, actually Python and R responses appear to be completer than SAS. What could be the reason for this?
LLMs are primarily trained on publicly available data from the internet, where open-source languages like Python and R have vastly more code examples, documentation, and community discussions compared to proprietary software like SAS. The open-source nature of Python and R means developers freely share code on platforms like GitHub, Stack Overflow, and academic repositories, creating a rich training dataset. Whilst SAS does have community forums and Stack Overflow questions, these resources are not as prominent or extensive as those for open-source languages.
Where do you get the API key?
Most production LLM systems provide API access through their developer platforms. You obtain API keys by creating an account on the provider's website - for example, OpenAI's platform for ChatGPT, Anthropic's console for Claude, or Google's AI Studio for Gemini. These are typically paid services with usage-based pricing, though some offer free tiers with limited usage as previously mentioned.
How can one decide where to draw the line while using these LLMs?
Organisations should establish clear guidelines through standard operating procedures (SOPs) that define appropriate LLM usage boundaries. Critical factors include maintaining human oversight for all outputs, avoiding the processing of sensitive or confidential data through external APIs, and ensuring compliance with industry regulations and internal data governance policies.
Would this work within SAS LSAF?
Yes, this will work with LSAF
Why didn't you use an LLM to write the actual SAS code in your macro?
You can ask LLM to write code, but lot of customer code are proprietary and not ubiquitously available for LLM to learn and provide response.
Recommended Resources
Webinar: The Power of GenAI in SAS Programming (On Demand)
AI's Influence on SAS Programming - Clinical Trial Software & Data Analysis | Cytel
Getting a Prompt Response Using ChatGPT and SAS
Please see additional resources in the attached slide deck.
Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.
Ready to level-up your skills? Choose your own adventure.
Your Home for Learning SAS
SAS Academic Software
SAS Learning Report Newsletter
SAS Tech Report Newsletter