Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

SAS DataMiner- Ensemble node

Reply
New Contributor
Posts: 3

SAS DataMiner- Ensemble node

Greetings everybody,

I'm working on small academic project with Sas DataMiner and I noticed that while I use Ensemble Node to merge results of 4 different classifiers I get "weaker" results than separate classifiers.

My diagram looks like this:

Ensemble Node.png

The problem is visible on ROC diagram:

Weaker results.png

Any ideas? I read that

"It is important to note that the ensemble model that is created from either approach can be more accurate than the individual models only if the individual models differ."

Is it connected with my problem? And why?

Super Contributor
Posts: 336

Re: SAS DataMiner- Ensemble node

Hi Vardens,

Unfortunately you can't know if your Ensemble is going to be better than your models until you try it.

From your plot, it looks like your Ensemble is overfitted. A quick suggestion, connect all the models and the ensemble to the model comparison. Try to identify if there is a model that could be throwing off the Ensemble, and re-run the ensemble node without that model.

Another alternative, do ensembles of 4, 3, and 2 models. I would not try every single combination, but models that have good fit statistics and might be discordant. Not sure if there is a statistical way to test discordance, I usually do try and error.

Let us know how it went!

Thanks,

Miguel

New Contributor
Posts: 3

Re: SAS DataMiner- Ensemble node

Is Ensemble also learning?

I thought it just count the votes:

Let's say I have 4 models: three votes 'yes',, last one 'no'. Shouldn't it answet 'yes'?

And what happends if I have 2x 'yes' and twice 'no'?

Super Contributor
Posts: 336

Re: SAS DataMiner- Ensemble node

By default the Ensemble node averages the predicted probabilities of your models.

If you have a class target, you can specify Posterior Probability as Voting. Voting can be done in two ways:

-average. The posterior probabilities of an event are averaged, and the event with higher average predicted probabilities is selected.

-proportion. The proportion of predicted events is selected. Priority is given to the descending level.

To answer your specific question, consider this example. For a given observation these are the posterior probabilities of four models for the levels Yes and No:

ModelProb of YesProb of No
Model 10.60.4
Model 20.70.3
Model 30.10.9
Model 40.150.85

Ensemble Voting by Average

The average posterior probabilities are 0.3875 for Yes and 0.6125 for No. For this example, the predicted level would be No.

Ensemble Voting by Proportion

This ensemble assigns a predicted probability of 0.5 since two out of four models predict each level. In this tie case, the event Yes is given priority because targets are formatted with descending order by default in Enterprise Miner.

I hope this helps!

Miguel

New Contributor
Posts: 3

Re: SAS DataMiner- Ensemble node

Thank you. It helped me a lot

Contributor
Posts: 42

Re: SAS DataMiner- Ensemble node

Dear Miguel,

I don't understand the second option (proportion). How the ensemble will select yes class.

Thanks

Super Contributor
Posts: 336

Re: SAS DataMiner- Ensemble node

For that specific example, 2 out of 4 models select yes, and the other 2 out of 4 select no. For this tie case, where both have 50 % voted probability, Yes is selected because descending levels (alphabetically descending) are given priority by default. You can change this order in the metadata definition.

Thanks,

Ask a Question
Discussion stats
  • 6 replies
  • 943 views
  • 4 likes
  • 3 in conversation