BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
matovua
Calcite | Level 5

Dear All

I have done a model which perfoms well when i oversample the rare event at 25% but gives too many False Positives. When i remove over sampling the missclassfication rate is good but the actual predicted and have actually churned number is too small to be used.

I need your help on what  can do

I am using SAS enterprise Miner

, Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

The notions of lift and misclassification rate are both problematic in rare event scenarios.  

Consider Lift...

The maximum possible lift for a ...
     ... 50% overall response rate is   100% / 50% =  2
     ... 25% overall response rate is   100% / 25%  =  4 
     ... 10% overall response rate is   100% / 10% =   2
     ... 5% overall response rate is   100% /  5% =   20 
     ... 2% overall response rate is   100%  /  2%  =   50

which demonstrates that you only get dramatic sounding values for lift with rare events.   However, a lift of 5 (identifying a group with a rate five times the overall response rate) represents a 5% probability if the overall rate is 1% (a gain of 4%), but a lift of 2 represents a 20% probability if the overall response rate is 10% (a gain of 10%).   Therefore, you must be careful not to compare lift across models fit to populations with different overall response rates.

Similarly consider misclassification rates...

In a group of 1000 observations with a ...
     ... 50% overall response rate, you have 500 events or approximately 50/100 in each decile on average
     ... 25% overall response rate, you have 250 events or approximately 25/100 in each decile on average 
     ... 10% overall response rate, you have 100 events or approximately 10/100 in each decile on average 
     ...  5%  overall response rate, you have 50 events or approximately  5/100 in each decile on average 
     ...  2%  overall response rate, you have  20 events or approximately   2/100 in each decile on average 

Even an excellent fitting model in with a lift of 5 would still only have 5*2/100 or 10/100 events if the response rate is 2%.   If you choose to predict that all of those observations have the event, you will end up misclassifying 90% of them.   Compare this to the model where you classify nobody as having the event -- you now have correctly classified 98% of them.   As a result, modeling rare events almost always means your misclassification rate will be worse than the null (intercept only) model except in the most extreme cases (or never, in my experience).    

 

Your choice of cutoff should be made based on combining the probability scores with your business needs.   In one case, I had a customer that didn't want to reject claims so they only were considering the top couple percentile for rejection where there was still an extremely high chance of the claims being fraudulent.  In another case, the customer had limited exposure and didn't mind sending much deeper in the list even though there was a low chance of response since they only needed a 2% response to be profitable.  In the end, overall confusion matrices are not helpful for rare events.  It is often more important to focus on performance in the group on which you wish to take action (e.g. the top 3% or whatever) than focusing on overall statistics which are so wildly impacted by your oversampling rate.  


I hope this helps!

Doug 

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

The notions of lift and misclassification rate are both problematic in rare event scenarios.  

Consider Lift...

The maximum possible lift for a ...
     ... 50% overall response rate is   100% / 50% =  2
     ... 25% overall response rate is   100% / 25%  =  4 
     ... 10% overall response rate is   100% / 10% =   2
     ... 5% overall response rate is   100% /  5% =   20 
     ... 2% overall response rate is   100%  /  2%  =   50

which demonstrates that you only get dramatic sounding values for lift with rare events.   However, a lift of 5 (identifying a group with a rate five times the overall response rate) represents a 5% probability if the overall rate is 1% (a gain of 4%), but a lift of 2 represents a 20% probability if the overall response rate is 10% (a gain of 10%).   Therefore, you must be careful not to compare lift across models fit to populations with different overall response rates.

Similarly consider misclassification rates...

In a group of 1000 observations with a ...
     ... 50% overall response rate, you have 500 events or approximately 50/100 in each decile on average
     ... 25% overall response rate, you have 250 events or approximately 25/100 in each decile on average 
     ... 10% overall response rate, you have 100 events or approximately 10/100 in each decile on average 
     ...  5%  overall response rate, you have 50 events or approximately  5/100 in each decile on average 
     ...  2%  overall response rate, you have  20 events or approximately   2/100 in each decile on average 

Even an excellent fitting model in with a lift of 5 would still only have 5*2/100 or 10/100 events if the response rate is 2%.   If you choose to predict that all of those observations have the event, you will end up misclassifying 90% of them.   Compare this to the model where you classify nobody as having the event -- you now have correctly classified 98% of them.   As a result, modeling rare events almost always means your misclassification rate will be worse than the null (intercept only) model except in the most extreme cases (or never, in my experience).    

 

Your choice of cutoff should be made based on combining the probability scores with your business needs.   In one case, I had a customer that didn't want to reject claims so they only were considering the top couple percentile for rejection where there was still an extremely high chance of the claims being fraudulent.  In another case, the customer had limited exposure and didn't mind sending much deeper in the list even though there was a low chance of response since they only needed a 2% response to be profitable.  In the end, overall confusion matrices are not helpful for rare events.  It is often more important to focus on performance in the group on which you wish to take action (e.g. the top 3% or whatever) than focusing on overall statistics which are so wildly impacted by your oversampling rate.  


I hope this helps!

Doug 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2907 views
  • 1 like
  • 2 in conversation