BookmarkSubscribeRSS Feed

Dynamic Data Exploration and Model Building in SAS Viya

Started ‎08-25-2023 by
Modified ‎08-25-2023 by
Views 533

 

The purpose of this blog is to show how easy it is to dynamically explore data and build predictive models in SAS Viya. SAS Visual Analytics and SAS Visual Statistics are two great places to start your journey into data exploration and predictive model creation. A good data scientist must have a host of skills to be able to survive in today’s data jungle. Being able to easily explore your data and efficiently build models are key skills required in data science.

 

To get started, I’ll be using financial services data. The accounts in the table represent consumers of home equity lines of credit, automobile loans, and other short- to medium-term credit instruments. Appropriate data cleansing has already been applied, so we can begin with statistical modeling. The target variable relates to whether an account holder purchased a new product from the bank in the past year. The data set contains more than 1 million rows and 24 columns. We will see more detail on the variables in our exploration, but they contain demographic information, account activity level, and various purchase behaviors.

 

AR_1_SampleData.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

I want to use a low code, no code approach so I’ll first load the table in SAS Visual Analytics.

 

AR_2_DataPane.png

 

A quick and easy way to begin exploring my data would be to create an automated explanation. The automated explanation reveals the most important underlying features for a target variable. In this example, I’m trying to understand whether an account will make a purchase (or not). Let’s easily create that explanation in a report with one click. I right-click on the binary target variable b_tgt and select Explain > Explain on current page.

 

AR_3_AutomatSelection1.png

 

From the resulting report, I begin my data exploration and discover all kinds of interesting information about the target variable of customer purchase. At the very top under What are the characteristics of tgt Binary New Product? we discover that approximately 20% of the customers made a purchase (flagged a value of 1). The remaining 80% had a value of 0 and therefore did not make a purchase. Then we can see both here and under What factors are most related to tgt Binary New Product? that the following three variables are the most related factors: Count Purchased over the Past 3 Years, Average Sales over the Lifetime, Average Sales over the Past 3 Years in Response to a Direct Promotion. Of course, it makes sense that these three factors could have a large effect upon whether a customer would make a purchase or not. Lastly, by examining What is the relationship between tgt Binary New Product and logi_rfm5 Count Purchased Past 3 Years? we can see that for purchasers, the average number of products bought over the past 3 years is about 1.6.

 

AR_4_AutomatExplain.png

 

Now that we’ve done a bit of data exploration, let’s quickly and easily see how we can build a predictive model. I open the Object menu and select Change Automated explanation to > Logistic Regression.

 

AR_5_ChangeToLog1.png

 

Boom! I am still astounded at how easy it is to create models in SAS Viya. As you’ve seen, just a couple of clicks and I have already built a predictive model! As a data scientist, being able to quickly explore data and efficiently build models are important skills. Let’s take a quick look at the logistic regression results.

 

AR_6_LogReg1-1.png

 

From the Summary Bar at the top, we can see that the KS statistic is 0.5855. We could use that model fit statistic to compare it to another model and determine which model had the higher value. The Fit Summary pane reveals that 17 of the 19 effects are significant at a .05 significance level. The two predictors at the bottom of the chart are not significant at .05. The Residual Plot shows that there do not appear to be any outliers in the residual data. And finally, the Confusion Matrix reveals that our model correctly identified 92,780 of the purchasers and 807,187 of the non-purchasers.

 

Could we do better or would you like to keep learning? Well, you have some great options. First, maybe you would be interested in taking an instructor-led course. SAS® Visual Statistics in SAS® Viya®: Interactive Model Building will get you started. In this course you will learn how to build several different models, tweak them to get better results, and learn how to interpret the results.

 

Secondly, maybe you would want to attend the SAS Explore conference, September 11-14 in Las Vegas, NV. I am presenting two Hands-on-Workshops on SAS Visual Statistics. As a Data Scientist at SAS Explore, you can learn new techniques for exploring large data sets. Also, you can learn how to derive reliable insights during complex problem-solving. Trust me, even though I am presenting at the conference, just like you I will want to find some time to increase my knowledge base. Never stop learning!

 

I hope to see you in Las Vegas or in one of our classes!

 

Find more articles from SAS Global Enablement and Learning here.

Comments

Awesome!  Automated Explanation is a fantastic tool and so easy to use.

Version history
Last update:
‎08-25-2023 03:01 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags