A Human Generated Introduction to Generative AI, Part 1: Synthetic Data Generation
Recent Library Articles
Recently in the SAS Community Library: In the first of two posts on applications of generative AI, SAS' @JThompson reveals the role of generating synthetic data.
AI and/or analytical marketing is the process of using capabilities like data collection, data-driven analysis, natural language processing (NLP) and machine learning (ML) to deliver customer insights and automate critical marketing decisions. Today, AI technologies are being used more widely to generate content, increase team efficiency, improve customer experience and deliver more accurate results.
With the increasing utility of Generative AI (i.e. GenAI), marketing teams use the technology to instantly create hyper-personalized marketing assets, generate insights from customer data and iterate tactical improvements on existing strategies. Given the vast amounts of multi-touchpoint data processed by brands, and the value of leveraging that data, AI adoption is increasingly critical for those that want to remain competitive.
With that said, I want to be crystal clear. There is much more to the AI discipline than GenAI. As everyday people prompt away hunting for clever new solutions to their use cases, or as brands assess, experiment and/or deploy agentic strategies to improve their business model, the level of autonomy of AI should and will vary by task, risk level and business context. It is critical that brands are empowered to adopt and leverage AI for marketing-centric use cases, while minimizing friction and disruption.
Image 1: AI Marketing Value Statement
And one more thing...don't forget a Composite AI strategy that augments all the recent excitement of GenAI, agents & agentic orchestration in this new world. The analytics landscape has evolved significantly during the past decade. Many organizations have progressed from basic statistical modeling to machine learning, and some have added deep learning to their toolkits as well. In this context, the emergence of GenAI — with its ability to create humanlike text, generate images, and write code — introduces new possibilities and questions.
How can a brand leader decide which AI solution to use for a given problem?
Let’s assume that the problem has been clearly defined, relevant inputs have been identified, and the desired output has been specified. A logical starting point is the nature of the problem: Is it a prediction problem or a generation problem?
Generation problems are easy to identify. If the desired output is unstructured — such as text, images, videos, or music — it is a generation problem.
Prediction problems come in two varieties: classification and regression. In classification problems, given an input, the user needs to make a choice from a set of predefined outputs. For example, given data about a customer, a marketer may want to predict whether the customer is at high, medium, or low risk for a retention/churn use case. The key here is that the output categories — high, medium, and low risk — are predefined by a human or unsupervised learning, not generated on the fly.
In regression problems, you want to predict a number (or a few numbers). Given data about a customer and engagement details, a marketer may want to predict what their risk level for churn will be six months from now. Or given past sales data for a product, an organization may want to predict its sales units for the next 24 hours. Note that the distinction between classification and regression can be somewhat fuzzy. For example, regression problems can often be framed as classification problems. With the nature of the problem identified, we can turn to the decision of which tool or solution to use.
Let’s start with an easy use case. If you have a generation problem to solve, there’s only one game in town: GenAI. Depending on the sort of output you want to generate, you may need to use multimodal LLMs from services offered by OpenAI, Anthropic or Google Gemini. If you have a prediction problem, however, matters become more complicated.
The most straightforward scenario is when the input data is all tabular. In this situation, you should favor traditional machine learning. While deep learning can also solve these problems, it brings a host of other burdens that may not be worth the effort: It may require more effort to “tune” the model to the problem, the model may not lend itself to managerial interpretability due to its black-box nature, and so on. In contrast, machine learning models are much quicker to build and tune and require less “babysitting,” and interpretable methods are available.
By choosing machine learning over deep learning, you are not necessarily settling for lower accuracy in exchange for ease of development. Certain widely used machine learning methods (like Gradient Boosting) are not only easier to work with than deep learning but also can be more accurate, at times, for tabular data prediction problems.
SAS 360 Marketing AI - Why now?
This article series introduced SAS development efforts to release a solution-oriented software application offering prescriptive experiences (i.e. recipes) to address trending use cases for B2C (and B2B) brands. For readers unfamiliar with the term "recipe"...
The concept of recipes and required ingredients, which lives at the center of SAS 360 Marketing AI's design principles, can be outlined as: Data – What data do I need? Preparation – How does it need to be transformed? Use-case specific – Applicable ML/AI algorithm(s). Scoring - Segments, recommendations, propensities, etc. Activation – Using the scoring in journeys and channels.
Our hope is to create synergy improvements between marketers and data scientists while elevating self-sufficiency in running analytics at scale that package the best of SAS capabilities in a simple-to-use interface. In other words, SAS is introducing AI and advanced analytic capabilities FOR marketing use cases acutely. For a moment, reflect on the idea of a software application that is:
Designed for the domain space and themes of martech.
Focuses on use cases while minimizing adoption friction related to statistical jargon frequently misunderstood by anyone outside of the data science profession.
Uses the best of both worlds - GenAI blended with best-practice machine learning, predictive & segmentation capabilities in a no-code rapid-scoring mechanism that seamlessly integrates with the broader SAS CI360 solution, or external 3rd party martech tools.
Transforming marketing teams into analytical factories is a bold vision we challenged ourselves to innovate for.
Image 2: Analytical Challenges Facing Brands Today
This trend has resulted in a compelling insight for us at SAS, and a deep exploration of the Marketing AI landscape has resulted in the realization that there is a different way to approach this emerging paradigm.
... View more
In a previous post we introduced the concept of SAS Web Server authentication with SAS 9.4. In this post I want to extend this discussion to one option for the authentication. In this post we will discuss using Microsoft Entra ID with OpenID Connect as the authentication with SAS 9.4.
In that previous post we covered adding the authenticated user as a header and the shared secret. Here we will only address the authentication module configuration for OpenID Connect with Microsoft Entra ID.
... View more
Whether you’re writing your hundredth DATA step, working with inherited code, or debugging a macro that refuses to cooperate, AI coding assistants have become productivity enhancers for programmers. Two of the most capable platforms right now are ChatGPT (from OpenAI) and Claude (from Anthropic). In this post we’ll break down each tool and discuss protecting security, optimizing prompts, enhancing code workflows, and reducing hallucinations.
... View more
We have an user interface through DDC in Visual Analytics that captures the user's choice into parameters, processes them, adds logic and sends them via the append method to the sas code job.
The code executes well but overall the job takes 15 seconds to complete.
I have the suspicion that what consumes so much time is to lift the session and assign the caslibs, I assign all of them but I only need 2 for the code to run. The sas code once the session is created, takes less than a second to run as it is not resource heaevy what the code is doing.
Can I use the autoexec file for keeping the session alive, @pathew ? I ask you directly because you solved a similar issue this week for me. Thanks in advance.
Here is how the called sas code for execution initiates the session right now, the relevant part:
%put NOTE: Local = %sysfunc(putn(&dt_local, datetime20.));
%put NOTE: UTC= %sysfunc(putn(&dt, datetime20.));
/* %let local_vs_utc=%sysfunc(intck('dthours', &dt_local, &dt)); */
%let user_q = %str(%')&SYSUSERID%str(%');
%let uuid_q = %str(%')&row_id%str(%');
%let castbl=OUTPUT_EQUIPMENT_SRT_DEN;
%let caslib=pims;
%global skip_job lockstate;
%let skip_job=0; /* IMPORTANT: init so open-code %if is stable */
%let lockstate=ACCESSIBLE;
cas mySession sessopts=(caslib=casuser timeout=1800 locale="en_US");
caslib _all_ assign;
cas mySession sessopts=(caslib="&caslib");
libname mycas cas caslib="&caslib";
... View more
In this post we’ll discuss how we can use Non-Human Identities with SAS Viya. We will cover what we mean by Non-Human Identity and how we can leverage these types of identities with SAS Viya. You will need to be running a version of SAS Viya after the Stable 2026.02 release, to make full use of these features discussed here.
... View more