Imagine a world where you could discuss with your data as if it were a colleague. This might sound like a tech utopia, but with the power of SAS Viya, Azure OpenAI, Azure AI Speech service, LangChain, and Python programming, this is rapidly becoming a reality.
Have you ever wished you could just ask your data a question in plain English? Well, now you can.
Using the SASHELP.CARS data set as a source of data:
In some cases, data types are even converted on the fly from string to float, and unnecessary symbols like commas or dollar signs are removed.
English is not the only language in which you can formulate your queries. With the power of LangChain, you can also interact with your data in a language you speak, be it French, Dutch, Mandarin, Romanian or any other.
Text-to-text interaction, powered by Large Language Models (LLMs), has shown tremendous capabilities in handling unstructured data. When it comes to structured data, LangChain enters the fray, making it a breeze to interact with data.
Speech-to-speech interaction is another step forward. Here, your voice is captured and converted into text, and the generated response is converted back into speech.
When real-time interaction isn't a requirement, you can opt for a straightforward text-to-text approach. Within SAS Studio, you can call upon LangChain and Azure OpenAI with the help of PROC PYTHON.
For text-to-text interaction, you'll need:
When you are calling the Python program from SAS Studio, you must have:
For speech-to-speech interaction, additional requirements include:
The amalgamation of SAS Viya, Azure OpenAI, LangChain, and Python programming is opening new dimensions in data interaction. It's not just about querying anymore; it's about conversing, transforming the way we engage with our data.
In Valentina's blog, to process a SQL DB table, she loaded it in a Pandas DataFrame. I applied the same concept to a SAS data set, using PROC PYTHON to load it in a DataFrame.
I am fully aware that this approach might not be suitable for tens of millions of rows. However, you could construct aggregate data sets, first. The approach was quite common in Business Intelligence or Data Warehouse projects. Then, load those aggregates in a DataFrame.
However, in the first two demos, I had to first export the SASHELP.CARS data set to a CSV file and load it into a DataFrame, on my PC. This was due to the lack of a LangChain SAS agent and toolkit for direct SAS data access, and the necessity for my Python installation to access a microphone and speaker for speech-to-text and text-to-speech capabilities.
Despite this, LangChain can be employed in the Python setup associated with SAS Viya for text-to-text conversations. You can see this in action in the SAS Studio demo.
The code used in the speech-to-speech approach.
You can run the following program in SAS Studio to interact with Azure OpenAI and LangChain:
filename resp clear;
/*
This program uses Azure OpenAI to interact with a pandas DataFrame loaded from a SAS table.
It sets up a LangChain agent that can answer queries about the data in the DataFrame.
Functions:
ask(query): Prints a query and the agent's answer to the query.
*/
proc python;
submit;
import pandas as pd
from langchain.llms import AzureOpenAI
import openai
import os
import config
from langchain.agents import create_pandas_dataframe_agent
df = pd.read_sas("/opt/sas/viya/home/SASFoundation/sashelp/cars.sas7bdat")
os.chdir('folder_path_where_config.py_is_stored') # config.py contains the API details: keys, version, etc.
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_KEY"] = config.api_key
os.environ["OPENAI_API_BASE"] = config.api_base
os.environ["OPENAI_API_VERSION"] = config.api_version
llm = AzureOpenAI(deployment_name=config.api_model, model_name=config.api_model)
agent = create_pandas_dataframe_agent(llm, df, verbose=True)
def ask(query):
print('Q: ', query)
print('Final Answer: ', agent.run(query))
query = 'list the top 5 car make model pairs with most horsepower.'
ask(query)
endsubmit;
run;
"""
This script performs speech recognition and synthesis using Azure OpenAI and Azure Speech Service.
It loads a CSV file into a pandas DataFrame and sets up an agent to interact with the data using Azure OpenAI.
The script listens to user voice input, converts it to text, sends the text to Azure OpenAI,
and synthesizes the OpenAI's response back to speech.
Functions:
ask_langchain(query): Sends a query to Azure OpenAI, gets the response, and synthesizes the response to speech.
chat_with_open_ai(): Continuously listens for speech input, recognizes the speech, sends it to Azure OpenAI,
and synthesizes the OpenAI's response back to speech
"""
import pandas as pd
import openai
import os
import config
from langchain.llms import AzureOpenAI
from langchain.agents import create_pandas_dataframe_agent
import azure.cognitiveservices.speech as speechsdk
df = pd.read_csv("cars.csv")
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_KEY"] = config.api_key
os.environ["OPENAI_API_BASE"] = config.api_base
os.environ["OPENAI_API_VERSION"] = config.api_version
os.environ["SPEECH_KEY"] = config.speech_key
os.environ["SPEECH_REGION"] = config.speech_region
llm = AzureOpenAI(deployment_name=config.api_model, model_name=config.api_model)
agent = create_pandas_dataframe_agent(llm, df, verbose=True)
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
audio_output_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
speech_config.speech_recognition_language="en-US"
speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural'
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_output_config)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
def ask_langchain(query):
text = agent.run(query)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("Speech synthesized to speaker for text [{}]".format(text))
if speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_synthesis_result.cancellation_details
print("Speech synthesis canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
def chat_with_open_ai():
while True:
print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.")
try:
speech_recognition_result = speech_recognizer.recognize_once_async().get()
if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
if speech_recognition_result.text == "Stop.":
print("Conversation ended.")
break
print("Recognized speech: {}".format(speech_recognition_result.text))
ask_langchain(speech_recognition_result.text)
elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
break
elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_recognition_result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
print("Did you set the speech resource key and region values?")
except EOFError:
break
try:
chat_with_open_ai()
except Exception as err:
print("Encountered exception. {}".format(err))
Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about having conversations with your data. If you wish to get more information, please write me an email.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.