Using Generative AI (GAI) tools with your daily SAS tasks
- Safety first: Be aware what risks AI usage brings to you and your organization.
- Generate demo data to quickly illustrate your point.
- Generate data for different structures.
- Epilogue
I work for SAS and we have put a lot of thought into multiple aspects of Generative AI usage, this text is based on my own opinion and does not represent official SAS policy about this topic.
1. Safety first: Be aware what risks AI usage brings to you and your organization.
“If you’re not the customer; You’re the product” Did you know that in some generative AI services they are allowed to use anything you put in and get from the chat box as they please. There is cases where some company confidential information can now be found in those re-trained models because careless use of AI-chats. When you are using any GAI-services, please find out who will own the rights into those prompts (inputs) and outcomes when using these tools. You might be giving away some copyrights without you knowing it.
In my example I use MS Copilot (https://copilot.microsoft.com/) because I think it will be easiest to purchase for companies and in my understand, you are not giving away your copyrights for the prompts or the output.
In my case I just needed to login using my work account information to see “Protected” in upper right corner.
Picture 1. Make sure you are logged in as your company account to be protected.
Picture 2. You should also see “Your personal and company data are protected in this chat” if you have successfully logged with your company account and your company has copilot licensed.
2. Generate demo data to quickly illustrate your point.
I firmly believe that using generic demo-data is useful in most cases when in “brainstorming - Fail Fast” phase in your development cycle. In some cases, real data would be needed to see how the planning would look like with real data, with right terms and texts, but this might not be easily available. Topic specific demo data could solve some of these challenges in fraction of the time compared to real data (that might have security issues too).
Let’s use Copilot to generate us some demo data.
Some useful prompts could be: “generate me csv data about patient and healthcare. 10 rows”
Picture 3. Copilot output in this session.
Copy this content to a file and import it for example in SAS VIYA to show your vision to others. This type of real data is very hard to use in planning phase because content is highly sensitive. Generated data helps everyone see what the visualization could look like.
3. Generate data for different structures.
Not all report builders have clear understanding about different data structures, but this can be critical to maintain the correct information delivery when planning reports and interactions in content. I have found that relation structure, where data is divided in different tables and linked using some ID field is quite common structure for confusion when concept is unknown.
Let’s generate relational data for demo purposes about salary information in 3 different tables:
Prompt: “generate separated 3 csv files that has relations between files. data should be about people, salaries, and job risk index”
Answer:
Sure, here are three sample CSV files that have relations between them:
- People.csv: This file contains information about people, including their name, age, gender, and job title. Each person is assigned a unique ID number that can be used to link to other files. Here is a sample of the first 5 rows:
ID, Name, Age, Gender, Job Title
1, John Smith, 35, Male, Software Engineer
2, Jane Doe, 28, Female, Marketing Manager
3, Bob Johnson, 42, Male, Sales Representative
4, Sarah Lee, 29, Female, Data Analyst
5, Michael Brown, 45, Male, Project Manager
- Salaries.csv: This file contains information about salaries, including the person’s ID number, their salary, and the year the salary was earned. Here is a sample of the first 5 rows:
ID, Salary, Year
1, 100000, 2022
2, 75000, 2022
3, 80000, 2022
4, 60000, 2022
5, 120000, 2022
- JobRiskIndex.csv: This file contains information about job risk index, including the person’s ID number, their job risk index, and the year the job risk index was calculated. Here is a sample of the first 5 rows:
ID, Job Risk Index, Year
1, 0.25, 2022
2, 0.15, 2022
3, 0.35, 2022
4, 0.10, 2022
5, 0.20, 2022
These files can be linked together using the ID number column. For example, to link the information in the People.csv file to the Salaries.csv file, you can use the ID number column as the common link. Similarly, to link the information in the People.csv file to the JobRiskIndex.csv file, you can use the ID number column as the common link.
Please note that this data is generated for illustrative purposes only and does not represent any real-world information.
4. Epilogue
One of the GAI challenges is that depending on your chat history and active settings, you might get very different answer. For me, when I tested these same prompts, I got very different responses. Good thing is that when you get something that is looking a bit like you want, you can ask GAI to tune the answer.
Examples for tuning prompts:
- Generate 3 different csv files 5 rows or more that has relations between files. data should be about people, salaries, and job risk index
- Thanks. Now from jobriskindex I want you to remove person id column. from salaries table I want you to remove risk index column but keep the risk index ID
- Thanks. Add salaries data with year 2022 and change salary amount to be 98% from 2023 data
Should you spend time chatting with GAI or would it be faster to create the data in excel? I think both have its purposes, GAI is very good to create first draft and suggest terms about any topic. Excel is handy when duplicating rows or changing numbers or formats etc.
Merry Christmas and happy error hunting from SAS logs using GAI. Safety first!
Picture 4. AI generated picture about: Christmas theme, data creation with generative AI somewhere in the background.