Solved: Re: AUC vs MAPE, please help! Thank you

Kanyange · Posted 04-25-2014 08:13 AM

Hi All,

I had a quick question, I have created several models, and I use the AUC and MAPE to assess them. MAPE is calculated as below. My question, is the AUC is not good but the MAPE looks OK? How is it possible?..Below are my thresholds and AUC , MAPE results (Table at the Bottom)..So what type of decision should I make? Shall I look at AUC only, or MAPE? Your help will be much appreciated. Thank you

thresholds

AUC

0.6-0.7: Acceptable

0.71-0.8: Good

0.81-0.9: Excellent

MAPE

0.2-0.3: Acceptable

0.2-0.1:Good

0.1-0.001: Excellent

MAPE Calculation = 1/N Sum (|Actual-Predicted|/|Actual|)*100

Model	Gini	AUC	MAPE
Model 1	0.07	0.53	0
Model 2	0.14	0.57	0.0828
Model 3	0.37	0.69	0.0556
Model 4	0.08	0.55	0.0673
Model 5	0.01	0.51	0.0249
Model 6	0.09	0.55	0.1552
Model 7	0.18	0.59	0.2327
Model 8	0.16	0.58	0.0654
Model 9	0.13	0.57	0.0842
Model 10	0.14	0.57	0.0261
Model 11	0.35	0.68	0.1336
Model 12	0.16	0.58	0.1704
Model 13	0.07	0.54	0.0504
Model 14	0.11	0.56	0.096
Model 15	0.09	0.55	0.1478
Model 16	0.19	0.6	0.045
Model 17	0.18	0.59	0.0505
Model 18	0.16	0.59	0.1472
Model 19	0.17	0.59	0.1556

PatrickHall · Posted 04-28-2014 09:57 AM

I would use misclassification rate instead. It depends on your data, but I would be ok with a misclassification rate of 0.3 or less. GINI, AUC, c-statistic and logarithmic loss are other common measures for binary classification accuracy.

If you have a traditional binary target whose values are 0 and 1, then you should not use MAPE because you may be dividing by 0. Even if your binary target has different values than 0 and 1, MAPE and others measures like ASE and RMSE are meant for interval targets. These measures help you understand the average distance between your numeric regression predictions and your numeric observed values. In logistic regression, you are doing a classification, not a prediction. You are labeling cases as belonging to one group or another. The distance between these groups might be arbitrary or hard to understand, and that is why we look at the misclassification rate.

If your misclassification rate is between 0.3 and 0.5, then there are many steps you can take to find a more accurate model, with feature selection being the foremost. Have you tried forward, backward or stepwise variable selection? Another common problem with logistic regression is quasi-complete separation. Are any of your parameters greater than 15 or 20?

Also, your data may just be noisy and difficult to model.

View solution in original post

PatrickHall · Posted 04-27-2014 10:41 AM

MAPE is usually for models with interval targets (regression, time series, etc.) and not appropriate for scenarios where the actual values can be 0, as this could cause a division by 0 during the MAPE calculation.

Mean absolute percentage error - Wikipedia, the free encyclopedia

AUC is typically for binary classifiers like logistic regression.

Receiver operating characteristic - Wikipedia, the free encyclopedia

Do you have an interval or binary target?

If you have a binary target, what is the event occurrence rate for your target? The situation you describe is common for rare target event occurrences.

To increase your c-statistic/AUC for rare targets:

- Disproportionately over-sample the rare events

- Add a weight to the rare events

- Use an inverse prior distribution

Kanyange · Posted 04-28-2014 06:22 AM

Hi Patrick,

Thank you for your response...My target is binary and I have used Logistic Regression to build the model...The response rate varies, some models will have 50%, other 20%, other 10%.. and the lowest has around

So the response rate is not that rare...so Are you saying that for a binary target, I shouldn't use MAPE? What should I use then to compare Actual and Predicted...

Many Thanks

PatrickHall · Posted 04-28-2014 09:57 AM

I would use misclassification rate instead. It depends on your data, but I would be ok with a misclassification rate of 0.3 or less. GINI, AUC, c-statistic and logarithmic loss are other common measures for binary classification accuracy.

If you have a traditional binary target whose values are 0 and 1, then you should not use MAPE because you may be dividing by 0. Even if your binary target has different values than 0 and 1, MAPE and others measures like ASE and RMSE are meant for interval targets. These measures help you understand the average distance between your numeric regression predictions and your numeric observed values. In logistic regression, you are doing a classification, not a prediction. You are labeling cases as belonging to one group or another. The distance between these groups might be arbitrary or hard to understand, and that is why we look at the misclassification rate.

If your misclassification rate is between 0.3 and 0.5, then there are many steps you can take to find a more accurate model, with feature selection being the foremost. Have you tried forward, backward or stepwise variable selection? Another common problem with logistic regression is quasi-complete separation. Are any of your parameters greater than 15 or 20?

Also, your data may just be noisy and difficult to model.

Kanyange · Posted 04-30-2014 10:25 AM

Hi Patrick,

Many thanks for coming back to me...when you say Are any of your parameters greater than 15 or 20? What do you mean exactly...Do you mean the number of predictors?

Thank You

PatrickHall · Posted 04-30-2014 01:19 PM

I mean: Are your actual estimated parameters very large?

I should have said "magnitude" of 15 or 20 because in the logit space (1/(1+e^-(B0+B1*x1 + ... + Bk*xk))) that makes the exponential values very large - close to a machine infinity - and can cause problems with your model. It is one of the most common problems with logistic regression. For more information:

Separation (statistics) - Wikipedia, the free encyclopedia

AUC vs MAPE, please help! Thank you

Re: AUC vs MAPE, please help! Thank you

Re: AUC vs MAPE, please help! Thank you

Re: AUC vs MAPE, please help! Thank you

Re: AUC vs MAPE, please help! Thank you

Re: AUC vs MAPE, please help! Thank you

Re: AUC vs MAPE, please help! Thank you

SAS Innovate 2025: Save the Date