topic Re: Cut-off of misclassification error of logistic prediction models in Statistical Procedures

Cut-off of misclassification error of logistic prediction models

Season — Sun, 16 Apr 2023 11:46:35 GMT

I am currently building a logistic regression model as a prediction model. I need to perform model internal validation to test if the model worked well.

During the process, I am stuck on the problem of misclassification error. In SCORE statement of PROC LOGISTIC, one can request the computation of misclassification error by adding FITSTAT option to the SCORE statement.

I took a closer look at the computation formula of misclassification error. As SAS Help shows, the formula of misclassification rate is

. Simply speaking, according to this formula, the proportion of observations that were misclassified is designated as the misclassification rate. In SAS Help, it is stated that an observation is classified into the level with the largest probability. So it means that SAS uses 0.5 as a cut-off to classify the observations by default when the dependent variable follows a binomial distribution. It can be easily inferred that for dependent variables following a binomial distribution, if the posterior probability of "success" of a given observation were larger than 0.5, then the posterior probability of "failure" of that observation would be less than 0.5. As a result, the observation would be classified as "success", according to the method mentioned in SAS Help.

It can be easily understood that 0.5 is not always the "best" cut-off in terms of corresponding to the largest Youden index. However, I have read a few papers on prediction model validation given the prediction model is a logistic regression model. A posterior probability of 0.5 has indeed been used as a cut-off of misclassification error of internal validation of logistic regression prediction model. Gong's work can serve as an example. In Gong's article, he/she compared the ability to correct bias among Bootstrap, Jackknife and cross-validation. 0.5 is set up as the cut-off of misclassification.

So here is my question: in the setting of logistic prediction model validation, where multiple (usually exceeds 100) models are trained via Bootstrap, Jackknife or cross-validation and tested, is a posterior probability of 0.5 an acknowledged and universal cut-off of misclassification errors? Or should the cut-off vary from model to model, with the posterior probability having the largest Youden index to be the cut-off?

Many thanks!

Re: Cut-off of misclassification error of logistic prediction models

sbxkoenk — Mon, 17 Apr 2023 15:26:27 GMT

@Season wrote:

So here is my question: in the setting of logistic prediction model validation, where multiple (usually exceeds 100) models are trained via Bootstrap, Jackknife or cross-validation and tested, is a posterior probability of 0.5 an acknowledged and universal cut-off of misclassification errors? Or should the cut-off vary from model to model, with the posterior probability having the largest Youden index to be the cut-off?

I would think the latter. But I will give it a second thought.

Anyway, you can deviate from the 0.5 cut-off by using pprob option.

proc logistic data=train;
 model target = w h a / ctable 
                        pprob = (0.3, 0.5 to 0.8 by 0.1);
 score data=valid out=score;
run;

proc tabulate data=score;
 class f_target i_target;
 table f_target,i_target;
run;
/* end of program */

Cheers,

Koen

Re: Cut-off of misclassification error of logistic prediction models

Season — Sun, 15 Oct 2023 14:38:42 GMT

Thank you, Koen, for your reply! It seems that this problem is ubiquitous in resampling, where multiple samples are created. However, I have not yet found any research addressing this problem. I previously consulted a statistician of my institution, who responded that misclassification error rate obtained in both manners can be reported simultaneously.

Re: Cut-off of misclassification error of logistic prediction models

sbxkoenk — Sun, 15 Oct 2023 14:53:53 GMT

SAS® Enterprise Miner: Cutoff Node

SAS® Enterprise Miner™ 15.2: Reference Help

Cutoff Node

https://go.documentation.sas.com/doc/en/emref/15.2/n1qmjdusj37md5n1as50qvl0tram.htm

SAS Communities Library Article

Tip: Use the Cutoff Node in SAS® Enterprise Miner™ to Consume the Posterior Probabilities of Your Models Efficiently

Started ‎05-14-2014 | Modified ‎01-06-2016

https://communities.sas.com/t5/SAS-Communities-Library/Tip-Use-the-Cutoff-Node-in-SAS-Enterprise-Miner-to-Consume-the/ta-p/221196

SAS Communities Library Article

Tip: How to build a scorecard using Credit Scoring for SAS® Enterprise Miner™

Started ‎05-26-2015 | Modified ‎01-06-2016

https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-build-a-scorecard-using-Credit-Scoring-for-SAS/ta-p/223882

SAS Global Forum 2012 -- Data Mining and Text Analytics

Paper 127-2012
Use of Cutoff and SAS Code Nodes in SAS® Enterprise Miner™ to Determine Appropriate Probability Cutoff Point for Decision Making with Binary Target Models

Yogen Shah, Oklahoma State University, Stillwater, OK

https://support.sas.com/resources/papers/proceedings12/127-2012.pdf

BR,

Koen

Re: Cut-off of misclassification error of logistic prediction models

Season — Sun, 15 Oct 2023 15:23:06 GMT

Wow! 😀Thank you so much, Koen, for your wonderful reply! I never thought of receiving a solution to that problem! I will investigate the literatures you referenced in depth.

Thank you again for bearing my question in mind for such a long time!