Four Ways to Fast: Make Machine Learning Manageable with SAS® Enterprise Miner™

2 Likes

SAS Enterprise Miner is hands down my favorite SAS Java GUI. As a statistician, modeler, and SAS coder for many years, Enterprise Miner changed my life. All of a sudden, I realized that what had taken me weeks to code, I could do in an hour. In this blog, we’ll take a peek at how Enterprise Miner can make machine learning easy and manageable.

SAS Enterprise Miner streamlines the data mining process to create highly accurate predictive machine learning models based on analysis of lots of input data. Data mining and machine learning have applications in many areas, such as fraud detection, risk analysis, marketing and customer retention, bankruptcy prediction, and financial portfolio analysis. Before we continue – are you new to machine learning? If so, check out this resource: What is machine learning and why does it matter?

We often think about fast in terms of processing time. But there are many ways to reduce the time from inception to results. Four ways to fast are:

Spend less time coding
Spend less time going down the wrong path
Reduce run times
Set up efficient, effective, enterprise deployment

Enterprise Miner can shorten your time to results in all four ways.

1. Fast can be about saving our development time. User-friendly GUIs (graphical user interfaces) like SAS Enterprise Miner and SAS Forecast Server let you point-and-click and drag-and-drop while they build the SAS code for you. This can save weeks of time by:

Reducing the learning curve you would need to learn the pertinent SAS code
Keeping you from making syntax errors
Creating complex code much faster than you can humanly type

Enterprise Miner has been making data mining and machine learning easier since 1998. Examples of the machine learning nodes available in Enterprise Miner are listed in the following table.

2. Faster is also about heading in the right direction and not racing blindly down the wrong path. The fastest buffalos may be the ones who survive, but the fastest lemmings…well, they may not fare as well. Enterprise Miner guides you in the right direction by providing data visualizations in the interface, tools for reducing dimensionality, and quick results and model comparisons. Keep in mind that the right direction (correct methods, appropriate variables to use, even the business questions you should be asking) can change based on the situation.

original (1).jpg

Baltimore Orioles heading in the right direction (south for the winter, north for the summer), a common occurrence.

original (2).jpg

Baltimore Ravens heading in the right direction (west for the touchdown, east for the touchdown), a less common occurrence.

3. Reducing processing time is another great way to speed up your project. There has been a lot of discussion on this lately with the advent of SAS Viya, which runs on CAS and is amazingly fast. But did you know that even in SAS 9, you can still improve your processing times by using High Performing procedures (HP procs), which are available via nodes in Enterprise Miner? HP procs shorten run times by allowing your work to run in parallel. The example below (borrowed from Rich Mather at SAS) shows an improved run time AND improved lift by using 1000 iterations with PROC HP Neural versus 50 iterations with regular PROC NEURAL.

original (3).jpg

4. Setting up efficient, effective, enterprise-level deployment is another way to improve speed, efficiency and results. Enterprise Miner creates score code and you can register models in SAS Metadata Server so that you can easily integrate with other SAS tools such as SAS Model Manager and Enterprise Guide.

There is also a fifth way to fast, which involves eating nothing but lemon juice and water, but it is not really pertinent here.

Example

Let’s run through a very simple example to get a sense of how I can use Enterprise Miner to make machine learning easy and accessible. I will use the HMEQ data set, available in SAMPSIO library, to do Supervised Learning. This data set has a number of input variables, as well as a target variable BAD. BAD indicates whether or not the home equity loan recipient defaulted (BAD =1) or not (BAD =0). (For a step-by-step video, see YouTube link.)

First I partition my HMEQ data into 70% training data and 30% validation data using the Data Partition node. I do not partition out a test data set for this example for demonstration purposes, but I easily could have also partitioned out Test data (recommended) with the Data Partition node.

Also, I explore my original data a bit using nodes from the Explore tab. Using the Multiplot node, I see that my LOAN (loan amount) and MORTDUE (mortgage amount due) variables are right skewed with long tails, so I decide to transform these values using a log transformation, as shown below.

I look at how many missing values I have using the DMDB node; the result is shown below.

I see that I have a lot of missing values, and I know that both regression and neural networks are sensitive to missing values, so I decide that I will use the Impute node, and I decide to set missing values for interval variables to the median value.

I use a Metadata node to help me see which variables will be input into the model, and I have the capability to adjust variable roles that with that node.

Now I simply drag in the HP Neural node to run a neural network model, HP Forest node to run a random forest model, and HP Regression node to run a regression model. Voila! I just created three high performance models with three quick swipes. I can start out by going with the defaults and I can find out quickly if I am on the right path. If I wish, I also have great flexibility to modify the settings within those nodes.

Finally, I pull in my model comparison node, so that I can see which model performs the best, and I run the entire process flow, shown below.

I compare my validation model results and see that my random forest model performed the best, with the lowest misclassification rate (0.131844).

I see that my ROC curve and my cumulative lift curve also show my random forest model to be the best performer.

How easy was that? Now I can go to Happy Hour or watch my kid’s soccer game or get back to reading The Signal and The Noise by Nate Silver. I have good preliminary results quickly. There are many ways to improve my models, but I have achieved a great starting point with very little time invested. And I have already found a model that works well, and can focus my attention on further testing and possibly tweaking that model, and moving it into deployment.

In summary, SAS Enterprise Miner with High Performance (HP) procedures shortens your time from inception to results because it:

Has an easy-to-use GUI, which means less person-hours are needed for coding
Helps guide you quickly down the most productive path
Includes High Performance (HP) nodes, which take advantage of parallel processing to reduce run time
Generates score code and lets you register models in SAS Metadata to facilitate effective and efficient enterprise deployment

REFERENCES and ADDITIONAL RESOURCES:

Videos

• Wendy Czika. Learn by Example (Importing xml file templates from GitHub to run in Enterprise Miner).

Papers

• Patrick Hall, Jared Dean, Ilknur Kaynar Kabul, Jorge Silva. 2014. An Overview of Machine Learning wi...

. • Jonathan Wexler and Philip Easterling. White Paper.

Enterprise Miner Documentation and Marketing

• Enterprise Miner User’s Guide

• SAS Enterprise Miner page