# Ensemble Models and Partitioning Algorithms in SAS® Enterprise Miner

by ‎06-07-2017 03:35 PM - edited ‎01-31-2018 04:22 PM (1,800 Views)

• ,
• ### Statistical Procedures

Did you miss the Ask the Expert session on Ensemble Models and Partitioning Algorithms in SAS® Enterprise Miner? Not to worry, you can catch it on-demand at your leisure.

One strategy for increasing model accuracy involves the use of ensemble models.

In this session, various ensemble models will be presented based on partitioning algorithms in SAS Enterprise Miner, including:

• Decision trees.
• Bagging, boosting and gradient boosting.
• Random forests and ensemble trees.

Here are some highlighted questions from the Q&A segment held at the end of the session for ease of reference.

Can I use all model nodes with the Ensemble Node?

In SAS Enterprise Miner 14.2 the Ensemble node only supports the modeling nodes that generate score code in DATA step format. Not Memory Based Reasoning, HP Forest or HP Text Miner

What if I have an interval target variable, can I use the Ensemble Node with it?

Yes, Ensemble Node works with either an interval target or categorical target variable

Is there a maximum number of models that can be ensemble?

No there is no maximum, must have 1 or more model nodes proceeding the ensemble node.

How does the voting combination method work for an interval target?

The voting method is only available for categorical target variables. When you use the voting method to compute the posterior probabilities, two methods are available for voting the posterior probabilities: Average and Proportion.

When you get the end group, is the bootstrap samples already combined and averaged?

Yes. The End Groups node will function as a model node and present the final aggregated model.

For Stacked Ensembles, do you first run all 4 models independently to pick the best model from each then merge?

Yes, then you merge the predictions for the 4 models and model using the predictions as inputs.

How do we know which ensemble approach(average/stacking/cluster-based) we should use for the certain situation?

The great news with SAS Enterprise Miner you can use all and see which one works best for your data in your situation.

The best way to avoid overfitting is to use a holdout sample to validate the model on data that was not used for training.

I realize this webinar is about Enterprise Miner, but can we do similar things in Enterprise Guide, and which one has greater market presence?

With Enterprise Guide you could program to accomplish some of the same ensemble techniques but it would be fairly complex. Gradient Boosting, Random Forest and Neural Networks are not available in SAS/Stat, so would not be available in Enterprise Guide unless you have licensed Enterprise Miner (or Viya products that include these algorithms) and use the procedures available in EM (or Viya)

Can you produce the 3D Scatter Plot in SAS?

Yes, There are several different procedures for creating these types of graphs. You can use PROC G3D or PROC G3GRID https://support.sas.com/sassamples/graphgallery/PROC_G3D.html

How about "rulefit", feeding Random Forests rules into LASSO model for trimming.

Yes, you can use the LEAFINDICATOR option with the SCORE statement in Random Forest and HP4SCORE to output a 0-1variable for every leaf in the forest indicating whether the observation is in the leaf. This data can then be the input data into a LASSO routine. You would want to ensure you don't have too many leaves.

Does SAS EM display the lift for Ensemble modeling?

Yes, EM does provide lift for Ensemble modeling in the Model Comparison Results.

Can Ensemble node be used as Variable selection only?

The data flowing from Ensemble node does reflect the variables that are used in the models. A better way would be to use the metadata node after your models and decide how you want to include variables (ie if all models reject the variable, then reject it, if the majority of models reject, then reject, or if any model rejects, then reject).

Can we get detail setting for the stacked ensemble model?

Property settings are included in the slides. All nodes are default except for the metadata node. You need to change the probabilities (or predicted value) to be used as inputs in the model.

How do you move from the training data file which is used to develop a prediction model to a new data file where you can apply the model?

This can be accomplished several different ways within SAS Enterprise Miner. The two most simplest include

• Register your model to the metadata and then apply the model in SAS Enterprise Guide using the Model Scoring Task under Data Mining
• Define your new data as a Data Source available in SAS Enterprise Miner and specify it as a Test data set. This data set can then be fed into the Score Node to be scored within the SAS Enterprise Miner Flow. See example process flow below.

R

Please let me know the second book that was shown.

The two recommended books are:

• Decision Trees for Analytics Using SAS® Enterprise Miner™ by Barry de Ville and Padraic Neville Available on Amazon
• Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions by Giovanni Seni and John Elder Available on Amazon

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. To subscribe, select Subscribe from the Options drop down button above the articles.

Contributors