Hi All,
I had a quick question, I have created several models, and I use the AUC and MAPE to assess them. MAPE is calculated as below. My question, is the AUC is not good but the MAPE looks OK? How is it possible?..Below are my thresholds and AUC , MAPE results (Table at the Bottom)..So what type of decision should I make? Shall I look at AUC only, or MAPE? Your help will be much appreciated. Thank you
thresholds
0.6-0.7: Acceptable
0.71-0.8: Good
0.81-0.9: Excellent
0.2-0.3: Acceptable
0.2-0.1:Good
0.1-0.001: Excellent
MAPE Calculation = 1/N Sum (|Actual-Predicted|/|Actual|)*100
Model | Gini | AUC | MAPE |
Model 1 | 0.07 | 0.53 | 0 |
Model 2 | 0.14 | 0.57 | 0.0828 |
Model 3 | 0.37 | 0.69 | 0.0556 |
Model 4 | 0.08 | 0.55 | 0.0673 |
Model 5 | 0.01 | 0.51 | 0.0249 |
Model 6 | 0.09 | 0.55 | 0.1552 |
Model 7 | 0.18 | 0.59 | 0.2327 |
Model 8 | 0.16 | 0.58 | 0.0654 |
Model 9 | 0.13 | 0.57 | 0.0842 |
Model 10 | 0.14 | 0.57 | 0.0261 |
Model 11 | 0.35 | 0.68 | 0.1336 |
Model 12 | 0.16 | 0.58 | 0.1704 |
Model 13 | 0.07 | 0.54 | 0.0504 |
Model 14 | 0.11 | 0.56 | 0.096 |
Model 15 | 0.09 | 0.55 | 0.1478 |
Model 16 | 0.19 | 0.6 | 0.045 |
Model 17 | 0.18 | 0.59 | 0.0505 |
Model 18 | 0.16 | 0.59 | 0.1472 |
Model 19 | 0.17 | 0.59 | 0.1556 |
I would use misclassification rate instead. It depends on your data, but I would be ok with a misclassification rate of 0.3 or less. GINI, AUC, c-statistic and logarithmic loss are other common measures for binary classification accuracy.
If you have a traditional binary target whose values are 0 and 1, then you should not use MAPE because you may be dividing by 0. Even if your binary target has different values than 0 and 1, MAPE and others measures like ASE and RMSE are meant for interval targets. These measures help you understand the average distance between your numeric regression predictions and your numeric observed values. In logistic regression, you are doing a classification, not a prediction. You are labeling cases as belonging to one group or another. The distance between these groups might be arbitrary or hard to understand, and that is why we look at the misclassification rate.
If your misclassification rate is between 0.3 and 0.5, then there are many steps you can take to find a more accurate model, with feature selection being the foremost. Have you tried forward, backward or stepwise variable selection? Another common problem with logistic regression is quasi-complete separation. Are any of your parameters greater than 15 or 20?
Also, your data may just be noisy and difficult to model.
MAPE is usually for models with interval targets (regression, time series, etc.) and not appropriate for scenarios where the actual values can be 0, as this could cause a division by 0 during the MAPE calculation.
Mean absolute percentage error - Wikipedia, the free encyclopedia
AUC is typically for binary classifiers like logistic regression.
Receiver operating characteristic - Wikipedia, the free encyclopedia
Do you have an interval or binary target?
If you have a binary target, what is the event occurrence rate for your target? The situation you describe is common for rare target event occurrences.
To increase your c-statistic/AUC for rare targets:
- Disproportionately over-sample the rare events
- Add a weight to the rare events
- Use an inverse prior distribution
Hi Patrick,
Thank you for your response...My target is binary and I have used Logistic Regression to build the model...The response rate varies, some models will have 50%, other 20%, other 10%.. and the lowest has around
So the response rate is not that rare...so Are you saying that for a binary target, I shouldn't use MAPE? What should I use then to compare Actual and Predicted...
Many Thanks
I would use misclassification rate instead. It depends on your data, but I would be ok with a misclassification rate of 0.3 or less. GINI, AUC, c-statistic and logarithmic loss are other common measures for binary classification accuracy.
If you have a traditional binary target whose values are 0 and 1, then you should not use MAPE because you may be dividing by 0. Even if your binary target has different values than 0 and 1, MAPE and others measures like ASE and RMSE are meant for interval targets. These measures help you understand the average distance between your numeric regression predictions and your numeric observed values. In logistic regression, you are doing a classification, not a prediction. You are labeling cases as belonging to one group or another. The distance between these groups might be arbitrary or hard to understand, and that is why we look at the misclassification rate.
If your misclassification rate is between 0.3 and 0.5, then there are many steps you can take to find a more accurate model, with feature selection being the foremost. Have you tried forward, backward or stepwise variable selection? Another common problem with logistic regression is quasi-complete separation. Are any of your parameters greater than 15 or 20?
Also, your data may just be noisy and difficult to model.
Hi Patrick,
Many thanks for coming back to me...when you say Are any of your parameters greater than 15 or 20? What do you mean exactly...Do you mean the number of predictors?
Thank You
I mean: Are your actual estimated parameters very large?
I should have said "magnitude" of 15 or 20 because in the logit space (1/(1+e^-(B0+B1*x1 + ... + Bk*xk))) that makes the exponential values very large - close to a machine infinity - and can cause problems with your model. It is one of the most common problems with logistic regression. For more information:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.