From Screenshot to Table – SAS Viya with GPT-4 Turbo with Vision

3 Likes

This post introduces the GPT-4 Turbo with vision model from Azure OpenAI and explores its integration with SAS products like SAS Visual Analytics and SAS Studio. Imagine uploading a screenshot of a basic report and simply asking questions to understand its contents. Moreover, you can provide screenshots of your source files with sample data, and the GPT-4 Turbo with vision will generate the necessary SAS code to recreate the table in SAS Viya. Curious about turning a screenshot into a functional in-memory table in SAS Viya? Dive into the post and check out the accompanying video for a comprehensive guide.

Long Live LMM!

"Wait, do you mean LLM?!?"

On December 12th, Microsoft unveiled the public preview of the GPT-4 Turbo with vision model on Azure. This Large Multimodal Model (LMM), developed by OpenAI, accepts both text and image inputs, marking an evolution from the previous Large Language Models (LLMs). Within Azure AI Studio, this model can be seamlessly integrated with Azure AI's vision services, expanding its capabilities to include:

Advanced Optical Character Recognition (OCR): Extracting text, whether printed or handwritten, from images.
Refined object detection: Identifying and tagging objects and providing descriptions.
Improved video analysis: Monitoring video feeds for anomalies and tracking objects across video frames.
Data point generation: Converting information into structured files.

And the list goes on.

Examples

To grasp the functionality of the GPT-4 Turbo with Vision model, it's best to walk through a series of practical examples. Consider this scenario in Azure OpenAI Studio: you upload a screenshot of a SAS Visual Analytics report.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

From there, you can engage with the model in various ways:

Ask for identification: "What is this?"

Request data extraction: "Please extract the report data to a CSV file, comma delimited.”

Seek financial insights: "What are some key takeaways from the report?"

With LLM or LMM models you should always double check your calculations! Don’t trust the models blindly! [says Microsoft...].

Ask strategic questions: "Based on the gross profit margin – which one product would you discontinue and why?"

The response to such a question can be complex and controversial. For instance, if sofas show a low gross profit margin, it might suggest discontinuing them. However, consider the broader impact—what if sofas draw customers into the store who then purchase additional items?

My perspective is that the model's output should be viewed as a reflection in a mirror—it shows only what is directly in front of it. Similarly, the output from a model should serve as a tool to inform decision-making, not replace it.

The final decision rests on human shoulders.

From Screenshots to Table

Ready to take a bold step forward? Can you turn a few screenshots into a fully functional in-memory table in SAS Viya? Watch the video for a demonstration of the process.

GPT-4-Vision-From-Screenshot-to-Table (1).mp4

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

(view in My Videos)

I trust that I've demonstrated the art of the possible—converting images into a target table, it's entirely achievable.

Required Resources

SAS Viya deployment.
Azure subscription with access to Azure OpenAI GPT-4 Turbo with Vision. Currently, the Azure OpenAI resource can be deployed to a list of select locations.
GPT-4 model vision-preview version deployed.
Azure Computer Vision resource (optional, helps with OCR).
Your images and your prompts.

System Message

"You are an AI assistant that helps people build data structures. Users will submit images of their target table, data set or report. You will ask them what the source data looks like. Users must respond by submitting images, describing each source table or paste a few columns with sample data. You will then generate the SAS code needed to realize the data structure. You must breakdown the task into steps, then detail what operations are required: loading files, joins, filters, sorts, summarizations, etc. At the end, the target table must be loaded in-memory, in CAS as a global promoted table."

Prompts

Define your objective: “I need to produce the data used by the following SAS Visual Analytics report:”
Provide the model with source information: The following images describe my source files:

Force the model to extract text from images: “for the load step, can you extract the ‘/path/to' from the images and add it in the code (hint: the path is indicated in the bottom left corner)?"
CAS prompt: "The last step, promote… table in CAS will likely result in an error. The table must be loaded from WORK to the CASUSER caslib. Here’s an example, please refactor the code: <your code here>".

Automation

You can script the whole approach using the OpenAI Python client library. To access the images, you can use a URL or, encode the image using base64 and pass it to the Azure OpenAI API.

Conclusions

Screenshots to target table? Possible with SAS Viya and Azure OpenAI GPT-4 vision. Deploy the resources, upload a few screenshots of your target report, your source files and ask the model to generate the code to create the table underlying the report.

Additional Resources

Getting Started with Multimodality and Getting started with Azure GPT-4-Turbo Vision by Valentina Alto.
GPT-4 and GPT-4 Turbo Preview model availability.
Multimodal Conversational Interfaces with GPT and Vision AI by Scott Holden – Microsoft AI Summit, Melbourne, March 2024.

What to Read Next?

Thank you for your time reading this post. If you liked the post, give it a thumbs up! Please comment and tell us what you think about having conversations with your data. If you wish to get more information, please write me an email.

Find more articles from SAS Global Enablement and Learning here.

SAS Communities Library