NOTE: Updated to include questions, slides and links from the June 23, 2017 Ask the Expert session
Did you miss the Ask the Expert session on Model Selection in SAS Enterprise Guide and SAS Enterprise Miner? Not to worry, you can catch it on-demand at your leisure.
The session covers various Model Selection options. You learn how to:
Here are some highlighted questions from the Q&A segment held at the end of the session for ease of reference.
Q: Are all the criterion good for every modeling technique?
A: There are specific Criterion for each of the modeling algorithms. Some are available across algorithms and some are not.
Q: Which model selection technique should I use?
A: It depends on several things. First it depends on what modeling algorithm you use as not all techniques work for all algorithms, it also depends on what you goals are…Descriptive/Exploratory vs Prediction vs Data Mining. If you or your department or company do not have a standard take some time to research the methods presented in this session to see what will best serve you to reach your goals. Often I have seen certain industries and disciplines use specific methodology for their applications.
One of the great values of SAS Enterprise Miner is the ability to use many of these methods quickly and see if different criterion agree or disagree on which model is selected as the best model.
Q: Where can I download the data that’s used for these illustrations?
A: Download the zip file under the Getting Started with SAS Enterprise Miner Documentation (Example Data for Getting Started with SAS Enterprise Miner 14.1) and use the donor_raw and donor_score data http://support.sas.com/documentation/onlinedoc/miner/
Q: Is SAS HP Procedure tools such as PROC HPIMPUTE and PROC HPLOGISTIC part of base SAS 9.4 or are they licensed separately
A: SAS HP Procedures are available as part of their respective products. So there are a common set of procedures that are released with Base which include HPIMPUTE. HPLOGISTIC and HPREG are part of SAS/STAT. See the chart below for the procedures available depending on which products you have licensed.
In order to run these procedures in an MPP environment you would need to license the SAS High Performance product.
Q: Just to confirm, most of methods are good for use with numerical data, correct?
A: Yes, that is correct.
Q: If I am a beginner in predictive modeling what is the best measure to start with?
A: Start with misclassification. Then do some research on your industry and the type of problem you are trying to solve with your model to finer tune which criteria you should use.
Q: If you develop a model for ranking prediction, which model selection method would you recommend?
A: ROC and GINI are really good for ranking prediction.
Q: Can you use the SAS code node for testing as you showed us in SAS Enterprise Guide?
A: Yes, the code I showed in the SAS Enterprise Guide section can be run in SAS Enterprise Miner by using the SAS Code Node.
Q: Are there plans to have these capabilities in SAS Visual Statistics?
A: Visual Statistics is regularly experiencing feature enhancements but I am not aware of the exact roadmap for that product. To see the latest features of Visual Statistics see: http://support.sas.com/documentation/onlinedoc/vs/
Q: My question is regarding to exporting to code to C and Java in the Enterprise Miner. What kind of license is required to generate these codes.
A: You need to have Enterprise Miner licensed on the Server to export code to C or Java.
Q: How can you enhance a model (e.g., logistic regression) to take into account asymmetry of misclassification rate (e.g., false negative is 5X more costly than a false positive)?
A: Add a profit/loss matrix through Decision Processing in the Decision weights tab that e.g. gives higher weight (profit) to True Positives than to False Positives. Or you could instead use ‘Defaults with Inverse Prior Weights’ in the Decisions tab. Tune your regressions, d trees, neural nets, gradient boosting models on profit (or decision for tree-based methods).
You may want to use PROC DECIDE: (documentation available via SAS Technical Support). The DECIDE procedure creates optimal decisions based on a user-supplied decision matrix, prior probabilities, and output from a modeling procedure. This output can be either posterior probabilities for a categorical target variable or predicted values for an interval target variable. The DECIDE procedure can also adjust the posterior probabilities for changes in the prior probabilities.
Want more tips? Be sure to subscribe to the Data Mining Library to receive follow up Q/A, slides and other related resources from the webinar. From the Data Mining Library, just click Subscribe from the orange bar underneath the list of the recent articles.