<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cut-off of misclassification error of logistic prediction models in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/870192#M43065</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/437457"&gt;@Season&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;SPAN&gt;So here is my question:&amp;nbsp;&lt;STRONG&gt;in the setting of logistic prediction model validation, where multiple (usually exceeds 100) models are trained via Bootstrap, Jackknife or cross-validation and tested, is a posterior probability of 0.5 an acknowledged and universal cut-off of misclassification errors? Or should the cut-off vary from model to model, with the posterior probability having the largest Youden index to be the cut-off?&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I would think the latter. But I will give it a second thought.&lt;/P&gt;
&lt;P&gt;Anyway, you can deviate from the 0.5 cut-off by using &lt;EM&gt;pprob&lt;/EM&gt; option.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc logistic data=train;
 model target = w h a / ctable 
                        pprob = (0.3, 0.5 to 0.8 by 0.1);
 score data=valid out=score;
run;

proc tabulate data=score;
 class f_target i_target;
 table f_target,i_target;
run;
/* end of program */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;
&lt;DIV id="ConnectiveDocSignExtentionInstalled" data-extension-version="1.0.4"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
    <pubDate>Mon, 17 Apr 2023 15:26:27 GMT</pubDate>
    <dc:creator>sbxkoenk</dc:creator>
    <dc:date>2023-04-17T15:26:27Z</dc:date>
    <item>
      <title>Cut-off of misclassification error of logistic prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/870032#M43055</link>
      <description>&lt;P&gt;I am currently building a logistic regression model as a prediction model. I need to perform model internal validation to test if the model worked well.&lt;/P&gt;
&lt;P&gt;During the process, I am stuck on the problem of misclassification error. In SCORE statement of PROC LOGISTIC, one can request the computation of misclassification error by adding FITSTAT option to the SCORE statement.&lt;/P&gt;
&lt;P&gt;I took a closer look at the computation formula of misclassification error.&amp;nbsp;&lt;SPAN&gt;As SAS Help shows, the formula of misclassification rate is&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Season_0-1681641202041.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/82743i632CB8EA354058B4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Season_0-1681641202041.png" alt="Season_0-1681641202041.png" /&gt;&lt;/span&gt;&lt;SPAN&gt;. Simply speaking, according to this formula, the proportion of observations that were misclassified is designated as the misclassification rate. I&lt;/SPAN&gt;n SAS Help, it is stated that&amp;nbsp;&lt;SPAN&gt;an observation is classified into the level with the largest probability. So&amp;nbsp;&lt;STRONG&gt;it means that SAS uses 0.5 as a cut-off to classify the observations by default&lt;/STRONG&gt;&amp;nbsp;&lt;STRONG&gt;when the dependent variable follows a binomial distribution.&lt;/STRONG&gt;&amp;nbsp;It can be easily inferred that for dependent variables following a binomial distribution, if the posterior probability of "success" of a given observation were larger than 0.5, then the posterior probability of "failure" of that observation would be less than 0.5. As a result, the observation would be classified as "success", according to the method mentioned in SAS Help.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;It can be easily understood that 0.5 is not always the "best" cut-off in terms of corresponding to the largest Youden index. However, I have read a few papers on prediction model validation given the prediction model is a logistic regression model. &lt;STRONG&gt;A posterior probability of&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;0.5 has indeed been used as a cut-off of misclassification error of internal validation of logistic regression prediction model.&amp;nbsp;&lt;/STRONG&gt;&lt;A title="Gong's work" href="https://onlinelibrary.wiley.com/doi/pdf/10.1002/0471463760.app2" target="_blank" rel="noopener"&gt;Gong's work&lt;/A&gt;&amp;nbsp;can serve as an example. In Gong's article, he/she compared the ability to correct bias among Bootstrap, Jackknife and cross-validation. 0.5 is set up as the cut-off of misclassification.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;So here is my question:&amp;nbsp;&lt;STRONG&gt;in the setting of logistic prediction model validation, where multiple (usually exceeds 100) models are trained via Bootstrap, Jackknife or cross-validation and tested, is a posterior probability of 0.5 an acknowledged and universal cut-off of misclassification errors? Or should the cut-off vary from model to model, with the posterior probability having the largest Youden index to be the cut-off?&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Many thanks!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 16 Apr 2023 11:46:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/870032#M43055</guid>
      <dc:creator>Season</dc:creator>
      <dc:date>2023-04-16T11:46:35Z</dc:date>
    </item>
    <item>
      <title>Re: Cut-off of misclassification error of logistic prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/870192#M43065</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/437457"&gt;@Season&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;SPAN&gt;So here is my question:&amp;nbsp;&lt;STRONG&gt;in the setting of logistic prediction model validation, where multiple (usually exceeds 100) models are trained via Bootstrap, Jackknife or cross-validation and tested, is a posterior probability of 0.5 an acknowledged and universal cut-off of misclassification errors? Or should the cut-off vary from model to model, with the posterior probability having the largest Youden index to be the cut-off?&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I would think the latter. But I will give it a second thought.&lt;/P&gt;
&lt;P&gt;Anyway, you can deviate from the 0.5 cut-off by using &lt;EM&gt;pprob&lt;/EM&gt; option.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc logistic data=train;
 model target = w h a / ctable 
                        pprob = (0.3, 0.5 to 0.8 by 0.1);
 score data=valid out=score;
run;

proc tabulate data=score;
 class f_target i_target;
 table f_target,i_target;
run;
/* end of program */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Cheers,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;
&lt;DIV id="ConnectiveDocSignExtentionInstalled" data-extension-version="1.0.4"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Mon, 17 Apr 2023 15:26:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/870192#M43065</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2023-04-17T15:26:27Z</dc:date>
    </item>
    <item>
      <title>Re: Cut-off of misclassification error of logistic prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/898657#M44514</link>
      <description>&lt;P&gt;Thank you, Koen, for your reply! It seems that this problem is ubiquitous in resampling, where multiple samples are created. However, I have not yet found any research addressing this problem. I previously consulted a statistician of my institution, who responded that misclassification error rate obtained in both manners can be reported simultaneously.&lt;/P&gt;</description>
      <pubDate>Sun, 15 Oct 2023 14:38:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/898657#M44514</guid>
      <dc:creator>Season</dc:creator>
      <dc:date>2023-10-15T14:38:42Z</dc:date>
    </item>
    <item>
      <title>Re: Cut-off of misclassification error of logistic prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/898658#M44515</link>
      <description>&lt;P&gt;SAS® Enterprise Miner: Cutoff Node&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;SAS® Enterprise Miner™ 15.2: Reference Help&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Cutoff Node&lt;/P&gt;
&lt;P&gt;&lt;A href="https://go.documentation.sas.com/doc/en/emref/15.2/n1qmjdusj37md5n1as50qvl0tram.htm" target="_blank"&gt;https://go.documentation.sas.com/doc/en/emref/15.2/n1qmjdusj37md5n1as50qvl0tram.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;SAS Communities Library Article&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Tip: Use the Cutoff Node in SAS® Enterprise Miner™ to Consume the Posterior Probabilities of Your Models Efficiently&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Started ‎05-14-2014 | Modified ‎01-06-2016&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/Tip-Use-the-Cutoff-Node-in-SAS-Enterprise-Miner-to-Consume-the/ta-p/221196" target="_blank"&gt;https://communities.sas.com/t5/SAS-Communities-Library/Tip-Use-the-Cutoff-Node-in-SAS-Enterprise-Miner-to-Consume-the/ta-p/221196&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;SAS Communities Library Article&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Tip: How to build a scorecard using Credit Scoring for SAS® Enterprise Miner™&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Started ‎05-26-2015 | Modified ‎01-06-2016&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-build-a-scorecard-using-Credit-Scoring-for-SAS/ta-p/223882" target="_blank"&gt;https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-build-a-scorecard-using-Credit-Scoring-for-SAS/ta-p/223882&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;SAS Global Forum 2012 -- Data Mining and Text Analytics&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Paper 127-2012 &lt;BR /&gt;Use of Cutoff and SAS Code Nodes in SAS® Enterprise Miner™ to Determine Appropriate Probability Cutoff Point for Decision Making with Binary Target Models&lt;/P&gt;
&lt;P&gt;Yogen Shah, Oklahoma State University, Stillwater, OK&lt;/P&gt;
&lt;P&gt;&lt;A href="https://support.sas.com/resources/papers/proceedings12/127-2012.pdf" target="_blank"&gt;https://support.sas.com/resources/papers/proceedings12/127-2012.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BR,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Sun, 15 Oct 2023 14:53:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/898658#M44515</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2023-10-15T14:53:53Z</dc:date>
    </item>
    <item>
      <title>Re: Cut-off of misclassification error of logistic prediction models</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/898661#M44517</link>
      <description>&lt;P&gt;Wow! &lt;span class="lia-unicode-emoji" title=":grinning_face:"&gt;😀&lt;/span&gt;Thank you so much, Koen, for your wonderful reply! I never thought of receiving a solution to that problem! I will investigate the literatures you referenced in depth.&lt;/P&gt;
&lt;P&gt;Thank you again for bearing my question in mind for such a long time!&lt;/P&gt;</description>
      <pubDate>Sun, 15 Oct 2023 15:23:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Cut-off-of-misclassification-error-of-logistic-prediction-models/m-p/898661#M44517</guid>
      <dc:creator>Season</dc:creator>
      <dc:date>2023-10-15T15:23:06Z</dc:date>
    </item>
  </channel>
</rss>

