We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Data mining in SAS Enterprise Miner? It's easy with RPM task in SAS Studio

by SAS Employee BethEbersole on ‎02-29-2016 02:44 PM - edited on ‎02-29-2016 02:47 PM by Community Manager (1,925 Views)

The SAS Rapid Predictive Modeler (RPM) task in SAS Studio not only lets you quickly and easily build predictive models using smart defaults, but it also creates an Enterprise Miner process flow and SAS code behind the scenes. Once the task in SAS Studio runs, you can open the Enterprise Miner process flow or the code and make any desired changes. The SAS Studio RPM task presents results in clear business terms, such as scorecards, lift charts, and variable importance. It automatically handles outliers, missing values, rare target events, skewed data, variable selection and model selection. Machine learning techniques such as neural networks and other data mining methods are used behind the scenes, and the best model is selected automatically.

 

You can further customize and tweak models using the Enterprise Miner GUI to edit the EM process flow that the RPM task creates, or by editing the code created behind the scenes. Models are registered in metadata to automate the execution of score code and make deployment to other systems easy. The SAS Studio Rapid Predictive Modeler task is useful for:  

 

Blog1aIcon.png Business Analysts, who simply want a fast and accurate answer to their business question. The Business Analyst will generally accept the results of the RPM, as is.

 

 

 

Statisticians, who may want to open the Enterprise Miner flow to look under the Blog1bIcon.pngcovers, and adjust some of the defaults and add/subtract nodes as they see fit in an effort to incrementally improve model accuracy and results.

 

 

 

 

Blog1cIcon.pngData Scientists and Coders, who may use the Rapid Predictive Modeler to develop a coding template, which they can use at a starting point, to edit and amend.

 

 

 

 

Auto safety example

Imagine you are interested in preventing auto accidents by issuing recalls on automobile parts that are likely to fail. In my example below, I start with a historic (notional) data set on auto parts that includes a binary target (dependent) variable TargetPartFailure. TargetPartFailure indicates whether or not the part failed: 1 = failure and 0 = no failure. Other variables include a unique ID variable (PartNumber), and input (independent) variables, such as PartType, PartAge, and NumIssuesReported.

 

RPM finds the best model based on the historic data. That model can then be applied to a completely new data set, which has no information on target part failure, but has the same inputs (independent variables) as the historic data set. This allows the analyst or manager to prioritize which auto parts should be further investigated for potential recalls.  

 

Let's import the data

The first step is to upload the data so that they are available in SAS Studio. Right click in the file where you want the data, and select Upload Files.

 

Blog1d.png

 

Navigate to the physical file where your data sets are stored, and select the files you want to upload.

 

Blog1e.png Blog1f.png Blog1g.png

 

Drag and drop your data from the navigation pane on the left into the work area on the right. The data should automatically load into a _TEMP library.

 

Blog1z.png

 

Next, expand the available Tasks, then expand the Data Mining subcategory. Double click on the Rapid Predictive Modeler task.

 

 Blog1h.png

 

 

Assign the target variable TargetPartFailure the role of Dependent Variable. Unlike the Enterprise Miner interface, if your target variable starts with “target,” RPM will not automatically assign it the role of Dependent Variable; you must assign this role.

 

Blog1i.png  

 

What are my options?

On the Options tab, under Model you may select Basic, Intermediate or Advanced. For this example, I select Advanced.

 

Blog1j.png

 

 

 

The Basic, Intermediate, and Advanced Model options are described in the SAS Studio 3.4 User’s Guide. I chose the advanced option, which evaluates the most models and then chooses the best performing model.

 

Under the Reports Option, you can choose Standard reports or Standard & additional reports. Check the reports you want to see.

  

Blog1k.png  

 

 

And outputs?

On the Output tab, check each box and specify the names and folders you would like for your output data sets and save locations. To keep this information handy and avoid typos in a future step, you can use Ctrl + c and Ctrl + v to copy and paste the project data name (e.g., RPMAutoSafety20160209) and the folder (e.g., C:\Users\sasdemo\EMProjects) into Notepad.

 

 

Blog1l.png

 

 

The SAS Studio RPM task will automatically create the output you requested. You will recognize this output, because it is Enterprise Miner output! For example, you will see an ROC plot with the K-S statistic.

 

Blog1m.png

 

 

The better the model, the higher and farther to the left the ROC curve will be, maximizing sensitivity and minimizing 1-specificity (that is, maximizing true positives and minimizing false positives). In my example, we have a pretty good K-S statistic (higher/closer to one is better) of 0.72388 for the validation data and 0.73372 for the training data. It is a good sign that the K-S statistic is similar for both the training and validation data, indicating that we did not overfit the training data.

 

How to open the SAS Enterprise Miner process flow

The SAS Studio RPM task created a SAS Enterprise Miner process flow behind the scenes. You can open that process flow in SAS Enterprise Miner to make any changes or additions to the flow that you want. Start by logging on to Enterprise Miner.

 

Blog1o.png

 

Open a new project. This is counterintuitive, but you definitely want to open a New Project.

 

Blog1p.png

 

 

Name the new project the exact same name as you named the output file in RPM, and browse to the same server directory as you indicated in RPM. This is why it is helpful to have copied that project name and server directory path into Notepad, to avoid any typos in this step.

Blog1q.png

 

When you hit Next you will get a Project Exist dialogue reading “The selected project exists on the filesystem. It may have been created by another user. Do you want to continue?” Click Yes. The project that was already created is the one you created using the RPM task in SAS Studio. Then click Next and Finish.   Blog1r.png Blog1s.png Blog1t.png

 

Open the Diagram and Voila! You see the Enterprise Miner process flow that you created with RPM in SAS Studio.

 

Blog1u.png Blog1v.png

 

You can now use Enterprise Miner to make any changes or additions that you want.

 

ADDITIONAL RESOURCES:

SAS Studio Tutorials

RPM VLE course

SAS Studio 3.4 User's Guide

 

If you'd like the sample data set used in this article, feel free to private message me via the community and I'll send it your way. 

Comments
by Contributor DataScientist
on ‎07-08-2016 04:49 AM

 

Hi Beth and Anna,

 

I don't see the data mining option in SAS Studio 3.5. Is this because I am using the University Edition of the software?

 

Cheers,

 

Sandesh.

by Community Manager
on ‎07-08-2016 06:52 AM

Hi @DataScientist, that is correct. University Edition does not include Enterprise Miner. If you're a university professor or student, SAS OnDemand for Academics is an option and it includes Enterprise Miner.

by Contributor DataScientist
on ‎07-09-2016 02:12 AM

Thank you for your response @BeverlyBrown

 

Makes sense now.

 

Do you think going forward SAS might have university editions of EG and EM given that these are used a fair bit in the industry and not everyone that aspires to work with these applications will have access to them?

 

Cheers,

 

Sandesh.

by Community Manager
on ‎07-18-2016 11:40 AM

Hi  @DataScientist, I checked with University Edition's product manager. She said: "We would have loved to have included SAS Enterprise Guide in the University Edition since it is a great tool for folks learning SAS.  Unfortunately, as a Windows-only client, it didn’t fit since University Edition needs to run on the Mac as well.  The good news is that SAS Studio is continuing to add EG-like functionality.  SAS Enterprise Miner is a bit of a different story – it’s really designed for a different audience. It is available for professors & their students via SAS OnDemand for Academics but there are no plans to include it in the University Edition right now."

by Contributor DataScientist
on ‎07-19-2016 08:47 AM

Thanks for the information Beverly. The university edition for SAS Studio is a great product as it stands. I look forward to additional functionality being added to it. For now, I think I need to focus on the statistical functionality that SAS Studio university edition offers at the moment and get a handle on SAS/STAT, which is a bit technical but I'm sure is a great asset for people in the industry that are comfortable with using and writing statistical procedures.

 

Cheers,

 

Sandesh.

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.