How Retrieval Augmented Generation (RAG) Works?
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Retrieval Augmented Generation (RAG) is an architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system that provides grounding data. Let’s start with some Large Language Model (LLM) generated poetry:
To RAG or not to RAG, that is the query,
A question of depth, not at all imaginary.
Shall we enhance our LLM with files galore,
Or trust its vast knowledge, needing no more?
To RAG or not to RAG that is the question we would like to answer in this post.
RAG Data Sources
Azure OpenAI facilitates RAG by integrating pretrained models with your unique datasets. You can augment the prompts sent to the model by incorporating a data source of your choice. This can be done by uploading your files directly, utilizing existing blob storage, or connecting to a pre-existing AI Search index. Azure OpenAI supports a range of file types, including .md, .txt, .html, .pdf, as well as .docx and .pptx documents. It's important to note that if these files include graphics or images, the quality of the model's response will hinge on the effectiveness of text extraction from these visuals.
Index
For indexing, we utilize Azure AI Search, which is distinct from Azure OpenAI. This component is designed to construct a search index from your data.
Search
Azure OpenAI leverages Azure AI Search to enhance prompts by appending pertinent data snippets. By default, the system prompts the model to prioritize, but not exclusively rely on your data. However, this preference can be adjusted during setup, allowing the model to balance its pre-trained knowledge with the information from your data sources.
Retrieval Augmented Generation Overview Architecture Diagram
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
How to Implement Retrieval Augmented Generation (RAG) with Azure OpenAI
Watch the following video that explains how to add a data source to your model. The video compares the code generation question answers given by the "base" GPT-4 model versus the GPT-4 model grounded in your data.
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
- en (Main), selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
As an intermediary conclusion, RAG could be an answer when the code you are trying to generate is "exotic" or proprietary or not covered by many public sources.
A Second RAG Example
Loading a SAS Data Set into CAS – A Comparison
Suppose you are prompting GPT-4 ‘I'd like to perform an efficient client-side SAS data set load in memory, in CAS. Let's assume the data set is PRDSALE from SASHELP. I want to load it in CASUSER caslib.’
Without RAG: The code generation process would rely solely on pre-defined scripts and the programmer's knowledge to accomplish the task.
With RAG: The code generation is enhanced by dynamically pulling in relevant information from an index, potentially improving efficiency and accuracy of the generated code snippet for loading the data set into CAS.
CAS Code Generation with RAG
Let's explore how RAG streamlines the process of generating CAS code using a single markdown file from a GEL Data Management workshop that outlines efficient client-side in-memory data loading:
Once the file is uploaded to an Azure Storage Account and ingested by Azure AI Search to create an index, Azure OpenAI is ready to tap into this data source.
Here's what happens when you request SAS code to load a data set into CAS:
- Azure OpenAI receives the user prompt, which in this case is a request to load the PRDSALE data set from SASHELP into the CASUSER caslib.
- It analyzes the prompt to grasp the desired content and intent, identifying the indexed file as highly relevant.
- The model then queries the search index using this intent.
- It incorporates the search results into the prompt, enriching it with both the system message and the user's original request.
- This enhanced prompt is sent to Azure OpenAI for processing.
- Azure OpenAI generates the SAS code, referencing the data where applicable, and returns it to the user.
And that's a brief rundown of RAG in action for CAS code generation.
Instead of a Conclusion
To RAG or not to RAG, ponder we must,
For each choice we make bears its own kind of trust.
Do we value the breadth, the external clout,
Or the depth of the mind, what it’s all about?
Stay Tuned
In a next post, we will compare a SWAT custom LangChain agent powered by a GPT-4 model with a GPT-4 model + RAG on documents highly relevant for SWAT code generation. The comparison is based on eighteen prompts, asking the model to perform light, medium and increasingly difficult data management tasks in SAS Viya.
Additional Resources
- Retrieval Augmented Generation (RAG) in Azure AI Search.
- SWAT Code Generation and Execution in SAS Viya with Azure OpenAI and LangChain.
- SWAT Code Generation and Execution in SAS Viya with Azure OpenAI and LangChain: Behind the Scenes.
- LangChain Custom Agent.
- GPT-4 Assisted Data Management in SAS Viya: A Custom LangChain Agent Approach.
- How to Create Your Custom LangChain Agent for SAS Viya.
- Conversing with Data: Turning Queries into Conversations with SAS Viya, Azure OpenAI and LangChain.
- Exploring LangChain and Azure OpenAI’s Ability to Write SQL and Join Tables To Answer Questions.
Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about having conversations with your data. If you wish to get more information, please write me an email.
Find more articles from SAS Global Enablement and Learning here.