The purpose of this post is to show how easy it is to automatically explain a target variable in SAS Viya in just a couple of clicks. Being able to understand the relationship between a target and its explanatory variables is a key step towards building predictive models. An automated explanation will quickly build a series of easily interpretable visualizations along with automatically generated storylines. Business analysts, data scientists, and even high-level executives can get a head start in answering everyday business problems with an automated explanation.
To get started, I’ll be using a data set which consists of observations taken from account holders at a large financial services firm. The accounts represent consumers of home equity lines of credit, automobile loans, and other short- to medium-term credit instruments. If you’ve been reading my posts, you’ll already be familiar with one of my favorite data tables. Since several of the continuous inputs are skewed and contain missing values, I’ll clean up the data with transformations and imputation before the automated explanation. This results in all of the variables starting with the expression logi_ which stands for inputs that have been log transformed and imputed missing values.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The binary target variable indicates if an account contracts for at least one product during the campaign season. A straightforward way to think of this is that a target value of 1 indicates a purchaser and a 0 indicates a non-purchaser. The data sets contain more than one million rows and (filtered to) 16 columns. We will see more detail on the variables in our exploration, but they contain demographic information, account activity level, customer value level and various purchase behaviors.
If I want to use the automated explanation feature, I’ll first need to open the table in SAS Visual Analytics and filter out (hide) a few columns.
A quick and easy way to begin exploring my data would be to create an automated explanation. The automated explanation reveals the most important underlying features for a target variable. In this example, I’m trying to understand whether an account will make a purchase (or not). Let’s easily create that explanation in a report with one click. I right-click on the target variable tgt Binary New Product and select Explain > Explain on current page.
The resulting report reveals that my target variable has an 80% chance of being a 0. In other words, the majority of my customers were non-purchasers.
Much of the report is aimed at explaining the most common value of 0 (non-purchasers), but honestly, we are more interested in the behavior of the purchasers (value of 1). Let’s update the chart by selecting 1 in the button bar.
From the resulting report, I begin my data exploration and discover all kinds of interesting information about the target variable of customer purchase. You’ll notice that the summary bar along the top has been updated to show that approximately 20% of the customers made a purchase (value of 1). Then we can see under “What factors are most related to tgt Binary New Product?” that the following three variables are the most related factors: count purchased over the past 3 years, average sales over the lifetime, average sales over the past 3 years in response to a direct promotion. Of course, it makes sense that these three factors could have a large effect upon whether a customer would make a purchase or not. Notice that the top bar is already selected.
It would be interesting to understand the relationship between this top factor (count purchased over the past 3 years) and our binary target variable. Fortunately, we already have an automatically generated chart to help us. Let’s examine “What is the relationship between tgt Binary New Product and logi_rfm5 Count Purchased Past 3 years?”
We can see that for purchasers, the average number of products bought over the past 3 years is about 1.6. Keep in mind that our data was transformed, so in reality customers bought approximately 5 products on average. To investigate the relationship between the account activity level and our target, select the category 1 factor.
It makes sense that the accounts with the highest activity (value of “x”) contain the majority of our purchasers. Before we complete our investigation of the automated explanation, let's turn on an option to show us the most likely and least likely groups of purchasers. In the Options pane, turn on High and low groups.
A new visual appears on the canvas of our automated explanation. By default, we see the top three groups that have the highest predicted probability of making a purchase.
On the High tab we are presented with the top three groups that are most likely to make a purchase. Let's examine the first group which has an almost 80% predicted probability of making a purchase. If the count purchased over the past 3 years is greater than or equal to a 1.6 and it has been less than 2.6 months since the las purchase, then a customer is very likely (79.70%) to make a purchase. In case you are curious, there is a decision tree being created in the background to give us all this wonderful information.
We got a great start in beginning to understand the relationship between our target variable and the explanatory inputs. A next great step would be to build a supervised model like a logistic regression. If you would like to keep learning, you might be interested in taking an instructor led course. SAS® Visual Statistics in SAS® Viya®: Interactive Model Building will get you started. In this course you will learn how to build several different models, tweak them to get better results, and learn how to interpret the results. If you would like to learn more about automated explanations, I suggest reading this paper by Rick Styll.
Find more articles from SAS Global Enablement and Learning here.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.