BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kanyange
Fluorite | Level 6

Hi All,

I had a quick question, I have created several models, and I use the AUC and MAPE to assess them. MAPE is calculated as below. My question, is the AUC is not good but the MAPE looks OK? How is it possible?..Below are my thresholds and AUC , MAPE results (Table at the Bottom)..So what type of decision should I make? Shall I look at AUC only, or MAPE? Your help will be much appreciated. Thank you

thresholds

  • AUC

0.6-0.7: Acceptable

0.71-0.8: Good

0.81-0.9: Excellent

  • MAPE

0.2-0.3: Acceptable

0.2-0.1:Good

0.1-0.001: Excellent

MAPE Calculation = 1/N Sum (|Actual-Predicted|/|Actual|)*100

ModelGiniAUCMAPE
Model 10.070.530
Model 20.140.570.0828
Model 30.370.690.0556
Model 40.080.550.0673
Model 50.010.510.0249
Model 60.090.550.1552
Model 70.180.590.2327
Model 80.160.580.0654
Model 90.130.570.0842
Model 100.140.570.0261
Model 110.350.680.1336
Model 120.160.580.1704
Model 130.070.540.0504
Model 140.110.560.096
Model 150.090.550.1478
Model 160.190.60.045
Model 170.180.590.0505
Model 180.160.590.1472
Model 190.170.590.1556
1 ACCEPTED SOLUTION

Accepted Solutions
PatrickHall
Obsidian | Level 7

I would use misclassification rate instead. It depends on your data, but I would be ok with a misclassification rate of 0.3 or less. GINI, AUC, c-statistic and logarithmic loss are other common measures for binary classification accuracy.

If you have a traditional binary target whose values are 0 and 1, then you should not use MAPE because you may be dividing by 0. Even if your binary target has different values than 0 and 1, MAPE and others measures like ASE and RMSE are meant for interval targets. These measures help you understand the average distance between your numeric regression predictions and your numeric observed values. In logistic regression, you are doing a classification, not a prediction. You are labeling cases as belonging to one group or another. The distance between these groups might be arbitrary or hard to understand, and that is why we look at the misclassification rate.

If your misclassification rate is between 0.3 and 0.5, then there are many steps you can take to find a more accurate model, with feature selection being the foremost. Have you tried forward, backward or stepwise variable selection? Another common problem with logistic regression is quasi-complete separation. Are any of your parameters greater than 15 or 20?

Also, your data may just be noisy and difficult to model.

View solution in original post

5 REPLIES 5
PatrickHall
Obsidian | Level 7

MAPE is usually for models with interval targets (regression, time series, etc.) and not appropriate for scenarios where the actual values can be 0, as this could cause a division by 0 during the MAPE calculation.

Mean absolute percentage error - Wikipedia, the free encyclopedia

AUC is typically for binary classifiers like logistic regression.

Receiver operating characteristic - Wikipedia, the free encyclopedia

Do you have an interval or binary target?

If you have a binary target, what is the event occurrence rate for your target? The situation you describe is common for rare target event occurrences.

To increase your  c-statistic/AUC for rare targets:

- Disproportionately over-sample the rare events

- Add a weight to the rare events

- Use an inverse prior distribution

Kanyange
Fluorite | Level 6

Hi Patrick,

Thank you for your response...My target is binary and I have used Logistic Regression to build the model...The response rate varies, some models will have 50%, other 20%, other 10%.. and the lowest has around

So the response rate is not that rare...so Are you saying that for a binary target, I shouldn't use MAPE? What should I use then to compare Actual and Predicted...

Many Thanks

PatrickHall
Obsidian | Level 7

I would use misclassification rate instead. It depends on your data, but I would be ok with a misclassification rate of 0.3 or less. GINI, AUC, c-statistic and logarithmic loss are other common measures for binary classification accuracy.

If you have a traditional binary target whose values are 0 and 1, then you should not use MAPE because you may be dividing by 0. Even if your binary target has different values than 0 and 1, MAPE and others measures like ASE and RMSE are meant for interval targets. These measures help you understand the average distance between your numeric regression predictions and your numeric observed values. In logistic regression, you are doing a classification, not a prediction. You are labeling cases as belonging to one group or another. The distance between these groups might be arbitrary or hard to understand, and that is why we look at the misclassification rate.

If your misclassification rate is between 0.3 and 0.5, then there are many steps you can take to find a more accurate model, with feature selection being the foremost. Have you tried forward, backward or stepwise variable selection? Another common problem with logistic regression is quasi-complete separation. Are any of your parameters greater than 15 or 20?

Also, your data may just be noisy and difficult to model.

Kanyange
Fluorite | Level 6

Hi Patrick,

Many thanks for coming back to me...when you say Are any of your parameters greater than 15 or 20? What do you mean exactly...Do you mean the number of predictors?

Thank You

PatrickHall
Obsidian | Level 7

I mean: Are your actual estimated parameters very large?

I should have said "magnitude" of 15 or 20 because in the logit space (1/(1+e^-(B0+B1*x1 + ... + Bk*xk))) that makes the exponential values very large - close to a machine infinity - and can cause problems with your model. It is one of the most common problems with logistic regression. For more information:  

Separation (statistics) - Wikipedia, the free encyclopedia

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3916 views
  • 0 likes
  • 2 in conversation