BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MikeTurner
Calcite | Level 5

when the success rate is small for logistic regression?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
adjgiulio
Obsidian | Level 7

Mike,

There are several ways to implement oversampling in EM. The first step is to determine what flavor of oversampling you are after. Is it oversampling, undersampling, weighting of observations, duplication of rare events? This choice is influenced by many factors, including the proportion of rare events (is it 10%, 1% 0.1%...?) and how many observations you have. The ultimate goal is to have enough examples of your rare class to allow the model to identify meaningful patterns.

Under a typical scenario  your target has a rare class, say 10%. If you had enough observations you could afford to oversample the rare class to 50%. You can do that using a sample node with the following properties: Size/Type=Percentage, Size/Percentage=100, Stratified/Criterion=Equal. This will result in a 50-50 sample where all of your rare events are used and only a sample of 0’s are chosen.

At this point you can already start running models, however all of you posterior probabilities and many performance metrics will not be reflecting the true priors. Still good to do model comparison and performance evaluation, as well as ranking of observations.

If you want your priors to be adjusted, then add a Decision Node (after data partition, for example). Under the Custom Editor add the real priors. This will prompt EM to adjust all of your posterior probabilities.

However, and this is something to be careful with, the Decision Node alone will NOT prompt EM to use the real priors as a cutoff value when choosing whether an observation is a 0 or a 1. In our example, even after using the Decision node, EM would use 0.5 as cutoff value.

In order to get the cutoff right, you need to go back to the decision node, go to the Decisions Tab and select Yes, then click Default to Inverse Prior Weights.

Under the Decision Weights tab, copy the value in the lower right corner to the lower left corner but add a minus in front of it. Replace the lower right corner with a 0. Just keep in mind that, even after all of this work, some metric (Misclassification in particular) will not reflect the actual priors. But the posterios will be right and the 0/1 decision will be right.

G

View solution in original post

2 REPLIES 2
adjgiulio
Obsidian | Level 7

Mike,

There are several ways to implement oversampling in EM. The first step is to determine what flavor of oversampling you are after. Is it oversampling, undersampling, weighting of observations, duplication of rare events? This choice is influenced by many factors, including the proportion of rare events (is it 10%, 1% 0.1%...?) and how many observations you have. The ultimate goal is to have enough examples of your rare class to allow the model to identify meaningful patterns.

Under a typical scenario  your target has a rare class, say 10%. If you had enough observations you could afford to oversample the rare class to 50%. You can do that using a sample node with the following properties: Size/Type=Percentage, Size/Percentage=100, Stratified/Criterion=Equal. This will result in a 50-50 sample where all of your rare events are used and only a sample of 0’s are chosen.

At this point you can already start running models, however all of you posterior probabilities and many performance metrics will not be reflecting the true priors. Still good to do model comparison and performance evaluation, as well as ranking of observations.

If you want your priors to be adjusted, then add a Decision Node (after data partition, for example). Under the Custom Editor add the real priors. This will prompt EM to adjust all of your posterior probabilities.

However, and this is something to be careful with, the Decision Node alone will NOT prompt EM to use the real priors as a cutoff value when choosing whether an observation is a 0 or a 1. In our example, even after using the Decision node, EM would use 0.5 as cutoff value.

In order to get the cutoff right, you need to go back to the decision node, go to the Decisions Tab and select Yes, then click Default to Inverse Prior Weights.

Under the Decision Weights tab, copy the value in the lower right corner to the lower left corner but add a minus in front of it. Replace the lower right corner with a 0. Just keep in mind that, even after all of this work, some metric (Misclassification in particular) will not reflect the actual priors. But the posterios will be right and the 0/1 decision will be right.

G

MikeTurner
Calcite | Level 5

Great! Thanks.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 15819 views
  • 0 likes
  • 2 in conversation