<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Choosing a proper cut-off for scorecards in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Choosing-a-proper-cut-off-for-scorecards/m-p/248633#M13082</link>
    <description>&lt;P&gt;Hi guys,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I recently built a scorecard model using SAS E-miner's credit scoring node. The scorecard proved to be very good with excellent gini values and the rank ordering of the scores in terms of the events/non-events was more than satisifactory. However, the event modelled is extremely rare. In the sample, the events (1's) account for roughly 30%, whereas the true population proportion is 0.15%. The person who will be using the model is completely focussed on wanting a cut-off below which all events will be classified as 1's when predicting. The problem I face now is that - because of the rare event - the model DOES accurately capture a large percentage of event's below certain cut-offs when running it on out-of-time data, but it has an extremely large False Positive Rate as well, due to the large number of non-events in the population. I have adjusted the regression intercept for oversampling, but this does not seem to do much i.t.v. of cut-offs. Is there perhaps any techniques that you would recommend?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!!&lt;/P&gt;</description>
    <pubDate>Mon, 08 Feb 2016 08:40:58 GMT</pubDate>
    <dc:creator>JakesVenter</dc:creator>
    <dc:date>2016-02-08T08:40:58Z</dc:date>
    <item>
      <title>Choosing a proper cut-off for scorecards</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Choosing-a-proper-cut-off-for-scorecards/m-p/248633#M13082</link>
      <description>&lt;P&gt;Hi guys,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I recently built a scorecard model using SAS E-miner's credit scoring node. The scorecard proved to be very good with excellent gini values and the rank ordering of the scores in terms of the events/non-events was more than satisifactory. However, the event modelled is extremely rare. In the sample, the events (1's) account for roughly 30%, whereas the true population proportion is 0.15%. The person who will be using the model is completely focussed on wanting a cut-off below which all events will be classified as 1's when predicting. The problem I face now is that - because of the rare event - the model DOES accurately capture a large percentage of event's below certain cut-offs when running it on out-of-time data, but it has an extremely large False Positive Rate as well, due to the large number of non-events in the population. I have adjusted the regression intercept for oversampling, but this does not seem to do much i.t.v. of cut-offs. Is there perhaps any techniques that you would recommend?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!!&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 08:40:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Choosing-a-proper-cut-off-for-scorecards/m-p/248633#M13082</guid>
      <dc:creator>JakesVenter</dc:creator>
      <dc:date>2016-02-08T08:40:58Z</dc:date>
    </item>
    <item>
      <title>Re: Choosing a proper cut-off for scorecards</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Choosing-a-proper-cut-off-for-scorecards/m-p/248782#M13087</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There will always be a trade-off of course between:&lt;/P&gt;
&lt;P&gt;- sensitivity = true positive rate (TPR) = hit rate = recall on the one hand AND&lt;/P&gt;
&lt;P&gt;- precision = positive predictive value (PPV) on the other hand&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Changing the cut-off (decision threshold) in a particular direction will improve one of these 2, but will worsen the other. Inevitably.&lt;/P&gt;
&lt;P&gt;Also: The higher the false positive rate, the lower the precision.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you already use the Gains table and the Trade-off Plots in the Scorecard node?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The trade-off plots display the approval rate and bad rate against cutoff scores. In credit scoring, trade-off plots are used to show how the approval rate and the bad rate among the accepted applicants depend on the cutoff score. A good scorecard enables the choice of a cutoff score that corresponds to a relatively high approval rate with a relatively low bad rate.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The gains table shows you "Average Marginal Profit" and "Average Total Profit" per score bucket using "Revenue Accepted Good" and "Cost Accepted Bad" (specified by you in the properties). I think the online doc (accessible from within EMiner) provides you with all the formulas.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you don't want to rely on the Scorecard node for choosing your cut-off you can always consider to use the cut-off node. It will choose the "best" cut-off probability according to the criterion of your choice (you can easily derive which score is mapped to it). For example: Kolmogorov–Smirnov&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And important! Enterprise Miner allows decision processing.&lt;/P&gt;
&lt;P&gt;See&lt;/P&gt;
&lt;P&gt;SAS® Enterprise Miner™ 14.1 Extension Nodes: Developer's Guide.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://support.sas.com/documentation/cdl/en/emxndg/67980/PDF/default/emxndg.pdf" target="_blank"&gt;https://support.sas.com/documentation/cdl/en/emxndg/67980/PDF/default/emxndg.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Appendix 3&lt;BR /&gt;Predictive Modeling&lt;BR /&gt;Decision Thresholds and Profit Charts (p. 178)&lt;/P&gt;
&lt;P&gt;The final classification of a new applicant in the class of good or the class of bad risks will be based on profit considerations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Some people choose to optimize the F1-score as the best balance between sensitivity and precision.&lt;/P&gt;
&lt;P&gt;F1-score is the harmonic mean of precision and sensitivity&lt;/P&gt;
&lt;P&gt;See &lt;A href="https://en.wikipedia.org/wiki/Precision_and_recall" target="_blank"&gt;https://en.wikipedia.org/wiki/Precision_and_recall&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;in case you want to maximize the F1-score, you can write an optimization to find the best cut-off or simply a simulation (let the cut-off vary between a start and a stop value by an increment and calculate the quality metrics that go with&amp;nbsp;each particular cut-off). Then make a choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Feb 2016 00:11:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Choosing-a-proper-cut-off-for-scorecards/m-p/248782#M13087</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2016-02-09T00:11:54Z</dc:date>
    </item>
    <item>
      <title>Re: Choosing a proper cut-off for scorecards</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Choosing-a-proper-cut-off-for-scorecards/m-p/248813#M13090</link>
      <description>&lt;P&gt;Thanks for the help Koen! I will have a look at the different techniques you mentioned and see which one shows the best performance/classification.&lt;/P&gt;</description>
      <pubDate>Tue, 09 Feb 2016 05:37:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Choosing-a-proper-cut-off-for-scorecards/m-p/248813#M13090</guid>
      <dc:creator>JakesVenter</dc:creator>
      <dc:date>2016-02-09T05:37:39Z</dc:date>
    </item>
  </channel>
</rss>

