Did you miss the Ask the Expert session on Model Selection in SAS Enterprise Guide and SAS Enterprise Miner? Not to worry, you can catch it on-demand at your leisure.
The session covers various Model Selection options. You learn how to:
Here are some highlighted questions from the Q&A segment held at the end of the session for ease of reference.
Are all the criterion good for every modeling technique?
There are specific criterion for each of the modeling algorithms. Some are available across algorithms and some are not.
Which model selection technique should I use?
It depends on several things. First it depends on what modeling algorithm you use as not all techniques work for all algorithms, it also depends on what you goals are…Descriptive/Exploratory vs Prediction vs Data Mining. If you or your department or company do not have a standard take some time to research the methods presented in this session to see what will best serve you to reach your goals. Often I have seen certain industries and disciplines use specific methodology for their applications.
One of the great values of SAS Enterprise Miner is the ability to use many of these methods quickly and see if different criterion agree or disagree on which model is selected as the best model.
Where can I download the data that’s used for these illustrations?
Download the zip file under the Getting Started with SAS Enterprise Miner Documentation (Example Data for Getting Started with SAS Enterprise Miner 14.1) and use the donor_raw and donor_score data
Is SAS HP Procedure tools such as PROC HPIMPUTE and PROC HPLOGISTIC part of base SAS 9.4 or are they licensed separately
SAS HP Procedures are available as part of their respective products. So there are a common set of procedures that are released with Base which include HPIMPUTE. HPLOGISTIC and HPREG are part of SAS/STAT. See the chart below for the procedures available depending on which products you have licensed.
In order to run these procedures in an MPP environment you would need to license the SAS High Performance product.
Just to confirm, most of methods are good for use with numerical data, correct?
Yes, that is correct.
If I am a beginner in predictive modeling what is the best measure to start with?
Start with misclassification. Then do some research on your industry and the type of problem you are trying to solve with your model to finer tune which criteria you should use.
What more can SAS Enterprise Miner can do compare to SAS/STAT?
SAS Enterprise Miner was created to help with Data Mining and Machine Learning projects. SAS/STAT covers a whole variety of classic statistical methods. Here are a couple of links with information on specific types of analysis each can do.
Splitting the validation dataset why you do not stratify by predictors?
Typically we stratify by the target (or Y) variable in order to maintain similar population percentages for both the Training and Validation data sets.
What is the purpose of test dataset?
The test dataset it a total hold out dataset. In EM validation is used in Decision Tree and Neural Networks to help from over fitting.
Can you compare Neural Network models in SAS Enterprise Miner to those in JMP PRO?
Yes. Enterprise Miner allows you to compare models from many different sources including JMP Pro and Open Source. You can use the model import node to bring these external models into SAS Enterprise Miner for comparison.
If missing data are imputed using median, do we use same median for training and validation data sets?
Typically yes. SAS Enterprise Miner does this for you automatically.
When your data is unbalanced (rare events), are there any special algorithms in SAS Enterprise Miner to handle such data?
Yes, Enterprise miner has methods for handling rare events when you start your EM project.
If you develop a model for ranking prediction, which model selection method would you recommend?
ROC and GINI are really good for ranking prediction.
Can you use the SAS code node for testing as you showed us in SAS Enterprise Guide?
Yes, the code I showed in the SAS Enterprise Guide section can be run in SAS Enterprise Miner by using the SAS Code Node.
Are there plans to have these capabilities in SAS Visual Statistics?
Visual Statistics is regularly experiencing feature enhancements but I am not aware of the exact roadmap for that product. To see the latest features of Visual Statistics see: http://support.sas.com/documentation/onlinedoc/vs/
My question is regarding to exporting to code to C and Java in the Enterprise Miner. What kind of license is required to generate these codes.
You need to have Enterprise Miner licensed on the Server to export code to C or Java.
How can you enhance a model (e.g., logistic regression) to take into account asymmetry of misclassification rate (e.g., false negative is 5X more costly than a false positive)?
Add a profit/loss matrix through Decision Processing in the Decision weights tab that e.g. gives higher weight (profit) to True Positives than to False Positives. Or you could instead use ‘Defaults with Inverse Prior Weights’ in the Decisions tab. Tune your regressions, d trees, neural nets, gradient boosting models on profit (or decision for tree-based methods).
You may want to use PROC DECIDE: (documentation available via SAS Technical Support). The DECIDE procedure creates optimal decisions based on a user-supplied decision matrix, prior probabilities, and output from a modeling procedure. This output can be either posterior probabilities for a categorical target variable or predicted values for an interval target variable. The DECIDE procedure can also adjust the posterior probabilities for changes in the prior probabilities.
Is SAS Enterprise Miner different from SAS Enterprise Guide?
Yes, SAS Enterprise Miner is designed to use for data mining and machine learning projects. SAS Enterprise Guide is an all-purpose tool for creating projects both through point and click and through programming.
When do you recommend using survival analysis? And how can I compare the performance of survival with logistic or other regressions?
You'll use Survival Analysis when you want to determine not only if an event will occur but also the timing of the event. Survival Analysis uses a time component that logistic and other regressions typically do not, so you would not be comparing these different types of regressions because they are not predicting the same thing.
Is there a clear distinction between model selection and variable selection and the criteria used for each?
Selecting a model from several competing models and selecting which variables that should go into the models involve very different techniques. I recommend you view our Ask the Expert on Variable selection so you can get more familiar with those methods.
In Enterprise Miner when using the model comparison, how can you tell the variables actually used in each of those models?
If you look at the SAS output in the results for a given model, you will find the list of variable selected for the model.
What's the best variable selection technique for Discrete Time Logistic (survival) models?
It depends on what procedure you are using. Both PHREG and SURVEYPHREG have selection methods like stepwise, backward, forward and score. These are covered in the Ask the Expert session for Variable Selection. Check the documentation for the procedure you are using for the options the procedure supports.
Must we use the same partition 60-30-10 on all 14 models on SAS Enterprise Miner?
Yes, in SAS Enterprise Miner you set up the partitioning first and then create all models from the same partitioned data.
Can you show me how to select an input dataset for SAS Enterprise Miner, i.e from the WORK library?
This explains how to point EM to wherever your SAS datasets live. The WORK library you see in SAS Enterprise Guide won't be available in SAS Enterprise Miner.
Recommended Resources
Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From the Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.
NOTE: For best results when opening the attached slides, click on the “download” icon.
FYI - small typo:
Can you sue the SAS code node for testing as you showed us in SAS Enterprise Guide?
Does it give away money? 😉
Thanks Reeza. I fixed it! It might not actually give away money but if you do model selection well you can certainly make more money or save money with your predictive models 🙂
Well detailed article
Could bring more details of the implementation of the algorithms
thks
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Ready to level-up your skills? Choose your own adventure.
Your Home for Learning SAS
SAS Academic Software
SAS Learning Report Newsletter
SAS Tech Report Newsletter