BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Vardens
Calcite | Level 5

Greetings everybody,

I'm working on small academic project with Sas DataMiner and I noticed that while I use Ensemble Node to merge results of 4 different classifiers I get "weaker" results than separate classifiers.

My diagram looks like this:

Ensemble Node.png

The problem is visible on ROC diagram:

Weaker results.png

Any ideas? I read that

"It is important to note that the ensemble model that is created from either approach can be more accurate than the individual models only if the individual models differ."

Is it connected with my problem? And why?

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

By default the Ensemble node averages the predicted probabilities of your models.

If you have a class target, you can specify Posterior Probability as Voting. Voting can be done in two ways:

-average. The posterior probabilities of an event are averaged, and the event with higher average predicted probabilities is selected.

-proportion. The proportion of predicted events is selected. Priority is given to the descending level.

To answer your specific question, consider this example. For a given observation these are the posterior probabilities of four models for the levels Yes and No:

ModelProb of YesProb of No
Model 10.60.4
Model 20.70.3
Model 30.10.9
Model 40.150.85

Ensemble Voting by Average

The average posterior probabilities are 0.3875 for Yes and 0.6125 for No. For this example, the predicted level would be No.

Ensemble Voting by Proportion

This ensemble assigns a predicted probability of 0.5 since two out of four models predict each level. In this tie case, the event Yes is given priority because targets are formatted with descending order by default in Enterprise Miner.

I hope this helps!

Miguel

View solution in original post

6 REPLIES 6
M_Maldonado
Barite | Level 11

Hi Vardens,

Unfortunately you can't know if your Ensemble is going to be better than your models until you try it.

From your plot, it looks like your Ensemble is overfitted. A quick suggestion, connect all the models and the ensemble to the model comparison. Try to identify if there is a model that could be throwing off the Ensemble, and re-run the ensemble node without that model.

Another alternative, do ensembles of 4, 3, and 2 models. I would not try every single combination, but models that have good fit statistics and might be discordant. Not sure if there is a statistical way to test discordance, I usually do try and error.

Let us know how it went!

Thanks,

Miguel

Vardens
Calcite | Level 5

Is Ensemble also learning?

I thought it just count the votes:

Let's say I have 4 models: three votes 'yes',, last one 'no'. Shouldn't it answet 'yes'?

And what happends if I have 2x 'yes' and twice 'no'?

M_Maldonado
Barite | Level 11

By default the Ensemble node averages the predicted probabilities of your models.

If you have a class target, you can specify Posterior Probability as Voting. Voting can be done in two ways:

-average. The posterior probabilities of an event are averaged, and the event with higher average predicted probabilities is selected.

-proportion. The proportion of predicted events is selected. Priority is given to the descending level.

To answer your specific question, consider this example. For a given observation these are the posterior probabilities of four models for the levels Yes and No:

ModelProb of YesProb of No
Model 10.60.4
Model 20.70.3
Model 30.10.9
Model 40.150.85

Ensemble Voting by Average

The average posterior probabilities are 0.3875 for Yes and 0.6125 for No. For this example, the predicted level would be No.

Ensemble Voting by Proportion

This ensemble assigns a predicted probability of 0.5 since two out of four models predict each level. In this tie case, the event Yes is given priority because targets are formatted with descending order by default in Enterprise Miner.

I hope this helps!

Miguel

Vardens
Calcite | Level 5

Thank you. It helped me a lot

husseinmazaar
Quartz | Level 8

Dear Miguel,

I don't understand the second option (proportion). How the ensemble will select yes class.

Thanks

M_Maldonado
Barite | Level 11

For that specific example, 2 out of 4 models select yes, and the other 2 out of 4 select no. For this tie case, where both have 50 % voted probability, Yes is selected because descending levels (alphabetically descending) are given priority by default. You can change this order in the metadata definition.

Thanks,

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3171 views
  • 4 likes
  • 3 in conversation