Retrieval Augmented Generation (RAG) is an architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system that provides grounding data. Let’s start with some Large Language Model (LLM) generated poetry:
To RAG or not to RAG, that is the query,
A question of depth, not at all imaginary.
Shall we enhance our LLM with files galore,
Or trust its vast knowledge, needing no more?
To RAG or not to RAG that is the question we would like to answer in this post.
Azure OpenAI facilitates RAG by integrating pretrained models with your unique datasets. You can augment the prompts sent to the model by incorporating a data source of your choice. This can be done by uploading your files directly, utilizing existing blob storage, or connecting to a pre-existing AI Search index. Azure OpenAI supports a range of file types, including .md, .txt, .html, .pdf, as well as .docx and .pptx documents. It's important to note that if these files include graphics or images, the quality of the model's response will hinge on the effectiveness of text extraction from these visuals.
For indexing, we utilize Azure AI Search, which is distinct from Azure OpenAI. This component is designed to construct a search index from your data.
Azure OpenAI leverages Azure AI Search to enhance prompts by appending pertinent data snippets. By default, the system prompts the model to prioritize, but not exclusively rely on your data. However, this preference can be adjusted during setup, allowing the model to balance its pre-trained knowledge with the information from your data sources.
Retrieval Augmented Generation Overview Architecture Diagram
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Watch the following video that explains how to add a data source to your model. The video compares the code generation question answers given by the "base" GPT-4 model versus the GPT-4 model grounded in your data.
As an intermediary conclusion, RAG could be an answer when the code you are trying to generate is "exotic" or proprietary or not covered by many public sources.
Suppose you are prompting GPT-4 ‘I'd like to perform an efficient client-side SAS data set load in memory, in CAS. Let's assume the data set is PRDSALE from SASHELP. I want to load it in CASUSER caslib.’
Without RAG: The code generation process would rely solely on pre-defined scripts and the programmer's knowledge to accomplish the task.
With RAG: The code generation is enhanced by dynamically pulling in relevant information from an index, potentially improving efficiency and accuracy of the generated code snippet for loading the data set into CAS.
Let's explore how RAG streamlines the process of generating CAS code using a single markdown file from a GEL Data Management workshop that outlines efficient client-side in-memory data loading:
Once the file is uploaded to an Azure Storage Account and ingested by Azure AI Search to create an index, Azure OpenAI is ready to tap into this data source.
Here's what happens when you request SAS code to load a data set into CAS:
And that's a brief rundown of RAG in action for CAS code generation.
To RAG or not to RAG, ponder we must,
For each choice we make bears its own kind of trust.
Do we value the breadth, the external clout,
Or the depth of the mind, what it’s all about?
In a next post, we will compare a SWAT custom LangChain agent powered by a GPT-4 model with a GPT-4 model + RAG on documents highly relevant for SWAT code generation. The comparison is based on eighteen prompts, asking the model to perform light, medium and increasingly difficult data management tasks in SAS Viya.
Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about having conversations with your data. If you wish to get more information, please write me an email.
Find more articles from SAS Global Enablement and Learning here.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.