SAS expertise delivered to your desktop -- on-demand and free!

Join Now

Ensemble Models and Partitioning Algorithms in SAS® Enterprise Miner - Ask the Expert Q&A

by SAS Employee MelodieRush on ‎06-07-2017 03:35 PM - edited Tuesday by Community Manager (1,595 Views)

NOTE: Updated to include questions, slides and linkes from the September 8, 2017 Ask The Expert Session

 

Did you miss the Ask the Expert session on Ensemble Models and Partitioning Algorithms in SAS® Enterprise Miner? Not to worry, you can catch it on-demand at your leisure.
 
Watch the webinar
 
The session covers Ensemble Models and Partitioning Algorithms in SAS® Enterprise Miner. The session covers:

 

  • An introduction to ensemble models and why they can be a valuable tool for predictive modeling
  • A review of decision trees and reveal a feature that makes partitioning algorithms such effective candidates for ensemble techniques
  • Define Bagging and Boosting
  • Discuss advantages and disadvantages for the following ensemble methods available in SAS Enterprise Miner
                      ○ Gradient Boosting
                      ○ Random Forests
                      ○ Stacked Ensembles

 

Here are some highlighted questions from the Q&A segment held at the end of the session for ease of reference.

Q: Can I use all model nodes with the Ensemble Node?

A: In SAS Enterprise Miner 14.2 the Ensemble node only supports the modeling nodes that generate score code in DATA step format. Not Memory Based Reasoning, HP Forest or HP Text Miner

Q: What if I have an interval target variable, can I use the Ensemble Node with it?

A: Yes, Ensemble Node works with either an interval target or categorical target variable

Q: Is there a maximum number of models that can be ensemble?

A: No there is no maximum, must have 1 or more model nodes proceeding the ensemble node.

Q: How does the voting combination method work for an interval target?

A: The voting method is only available for categorical target variables. When you use the voting method to compute the posterior probabilities, two methods are available for voting the posterior probabilities: Average and Proportion.

Q: When you get the end group, is the bootstrap samples already combined and averaged?

A: Yes. The End Groups node will function as a model node and present the final aggregated model.

Q: For Stacked Ensembles, do you first run all 4 models independently to pick the best model from each then merge?

A: Yes, then you merge the predictions for the 4 models and model using the predictions as inputs.

Q: How do we know which ensemble approach(average/stacking/cluster-based) we should use for the certain situation?

A: The great news with SAS Enterprise Miner you can use all and see which one works best for your data in your situation.

Q: What are your suggestions about avoiding overfitting?

A: The best way to avoid overfitting is to use a holdout sample to validate the model on data that was not used for training.

Q: I realize this webinar is about Enterprise Miner, but can we do similar things in Enterprise Guide, and which one has greater market presence?

A: With Enterprise Guide you could program to accomplish some of the same ensemble techniques but it would be fairly complex. Gradient Boosting, Random Forest and Neural Networks are not available in SAS/Stat, so would not be available in Enterprise Guide unless you have licensed Enterprise Miner (or Viya products that include these algorithms) and use the procedures available in EM (or Viya)

 

Q: Can you produce the 3D Scatter Plot in SAS?

 

A: Yes, There are several different procedures for creating these types of graphs. You can use PROC G3D or PROC G3GRID https://support.sas.com/sassamples/graphgallery/PROC_G3D.html

Untitled picture.png

Q: How about "rulefit", feeding Random Forests rules into LASSO model for trimming.

 

A: Yes, you can use the LEAFINDICATOR option with the SCORE statement in Random Forest and HP4SCORE to output a 0-1variable for every leaf in the forest indicating whether the observation is in the leaf. This data can then be the input data into a LASSO routine. You would want to ensure you don't have too many leaves.

Q: Does SAS EM display the lift for Ensemble modeling?

 

A: Yes, EM does provide lift for Ensemble modeling in the Model Comparison Results.

Q: Can Ensemble node be used as Variable selection only?

 

A: The data flowing from Ensemble node does reflect the variables that are used in the models. A better way would be to use the metadata node after your models and decide how you want to include variables (ie if all models reject the variable, then reject it, if the majority of models reject, then reject, or if any model rejects, then reject).

Q: Can we get detail setting for the stacked ensemble model?

 

A: Property settings are included in the slides. All nodes are default except for the metadata node. You need to change the probabilities (or predicted value) to be used as inputs in the model.
metadata.png

Want more tips? Be sure to subscribe to the Data Mining Library to receive follow up Q/A, slides and other related resources from the webinar. From the Data Mining Library, just click Subscribe from the orange bar underneath the list of the recent articles.

Contributors