How Retrieval Augmented Generation (RAG) Works?

Retrieval Augmented Generation (RAG) is an architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system that provides grounding data. Let’s start with some Large Language Model (LLM) generated poetry:

To RAG or not to RAG, that is the query,
A question of depth, not at all imaginary.
Shall we enhance our LLM with files galore,
Or trust its vast knowledge, needing no more?

To RAG or not to RAG that is the question we would like to answer in this post.

RAG Data Sources

Azure OpenAI facilitates RAG by integrating pretrained models with your unique datasets. You can augment the prompts sent to the model by incorporating a data source of your choice. This can be done by uploading your files directly, utilizing existing blob storage, or connecting to a pre-existing AI Search index. Azure OpenAI supports a range of file types, including .md, .txt, .html, .pdf, as well as .docx and .pptx documents. It's important to note that if these files include graphics or images, the quality of the model's response will hinge on the effectiveness of text extraction from these visuals.

Index

For indexing, we utilize Azure AI Search, which is distinct from Azure OpenAI. This component is designed to construct a search index from your data.

Search

Azure OpenAI leverages Azure AI Search to enhance prompts by appending pertinent data snippets. By default, the system prompts the model to prioritize, but not exclusively rely on your data. However, this preference can be adjusted during setup, allowing the model to balance its pre-trained knowledge with the information from your data sources.

Retrieval Augmented Generation Overview Architecture Diagram

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

How to Implement Retrieval Augmented Generation (RAG) with Azure OpenAI

Watch the following video that explains how to add a data source to your model. The video compares the code generation question answers given by the "base" GPT-4 model versus the GPT-4 model grounded in your data.

(view in My Videos)

As an intermediary conclusion, RAG could be an answer when the code you are trying to generate is "exotic" or proprietary or not covered by many public sources.

A Second RAG Example

Loading a SAS Data Set into CAS – A Comparison

Suppose you are prompting GPT-4 ‘I'd like to perform an efficient client-side SAS data set load in memory, in CAS. Let's assume the data set is PRDSALE from SASHELP. I want to load it in CASUSER caslib.’

Without RAG: The code generation process would rely solely on pre-defined scripts and the programmer's knowledge to accomplish the task.

With RAG: The code generation is enhanced by dynamically pulling in relevant information from an index, potentially improving efficiency and accuracy of the generated code snippet for loading the data set into CAS.

CAS Code Generation with RAG

Let's explore how RAG streamlines the process of generating CAS code using a single markdown file from a GEL Data Management workshop that outlines efficient client-side in-memory data loading:

Once the file is uploaded to an Azure Storage Account and ingested by Azure AI Search to create an index, Azure OpenAI is ready to tap into this data source.

03_BT_RAG_code_generation_response_grounded_in_your_data-1024x628.png

Here's what happens when you request SAS code to load a data set into CAS:

Azure OpenAI receives the user prompt, which in this case is a request to load the PRDSALE data set from SASHELP into the CASUSER caslib.
It analyzes the prompt to grasp the desired content and intent, identifying the indexed file as highly relevant.
The model then queries the search index using this intent.
It incorporates the search results into the prompt, enriching it with both the system message and the user's original request.
This enhanced prompt is sent to Azure OpenAI for processing.
Azure OpenAI generates the SAS code, referencing the data where applicable, and returns it to the user.

And that's a brief rundown of RAG in action for CAS code generation.

Instead of a Conclusion

To RAG or not to RAG, ponder we must,
For each choice we make bears its own kind of trust.
Do we value the breadth, the external clout,
Or the depth of the mind, what it’s all about?

Stay Tuned

In a next post, we will compare a SWAT custom LangChain agent powered by a GPT-4 model with a GPT-4 model + RAG on documents highly relevant for SWAT code generation. The comparison is based on eighteen prompts, asking the model to perform light, medium and increasingly difficult data management tasks in SAS Viya.

Additional Resources

Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about having conversations with your data. If you wish to get more information, please write me an email.

Find more articles from SAS Global Enablement and Learning here.