<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Calibration of logit model with large sample size in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922790#M45857</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm trying to assess if the logit model I'm using is a good one to estimate probability of an event. The overall rate of the event is around 4%. The sample size is nearly four million. Below is the decile calibration plot of predicted probability and observed probability. Does it mean it's a poor model? For such a large sample, should I split it into subsamples to improve model estimation/prediction? Thank you!&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="lichee_0-1712164470397.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/95156i465D64F716BE55DD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="lichee_0-1712164470397.png" alt="lichee_0-1712164470397.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Apr 2024 17:14:39 GMT</pubDate>
    <dc:creator>lichee</dc:creator>
    <dc:date>2024-04-03T17:14:39Z</dc:date>
    <item>
      <title>Calibration of logit model with large sample size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922790#M45857</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm trying to assess if the logit model I'm using is a good one to estimate probability of an event. The overall rate of the event is around 4%. The sample size is nearly four million. Below is the decile calibration plot of predicted probability and observed probability. Does it mean it's a poor model? For such a large sample, should I split it into subsamples to improve model estimation/prediction? Thank you!&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="lichee_0-1712164470397.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/95156i465D64F716BE55DD/image-size/medium?v=v2&amp;amp;px=400" role="button" title="lichee_0-1712164470397.png" alt="lichee_0-1712164470397.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Apr 2024 17:14:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922790#M45857</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-04-03T17:14:39Z</dc:date>
    </item>
    <item>
      <title>Re: Calibration of logit model with large sample size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922856#M45864</link>
      <description>&lt;P&gt;I suggest providing LOG from running your code including the code and all the notes and messages involved.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Without the code it is extremely hard to guess what options you may have used that might affect the output.&lt;/P&gt;
&lt;P&gt;Also the notes from the code would include number of observations actually used. The data set may have 4 million observations but it is not impossible that fewer were actually used. If any of your observations included missing values for any variables on a model statement they would typically not be used by default.&lt;/P&gt;
&lt;P&gt;Also there might be other diagnostic hints.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Apr 2024 21:11:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922856#M45864</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-04-03T21:11:31Z</dc:date>
    </item>
    <item>
      <title>Re: Calibration of logit model with large sample size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922883#M45865</link>
      <description>&lt;P&gt;Thank you! I'm attaching the log of PROC LOGISTIC and calibration plot.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I believe all the 3.96 million observations were used in the regression. Any insight is appreciated!&lt;/P&gt;</description>
      <pubDate>Thu, 04 Apr 2024 02:11:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922883#M45865</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-04-04T02:11:47Z</dc:date>
    </item>
    <item>
      <title>Re: Calibration of logit model with large sample size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922885#M45866</link>
      <description>Your sample size is too big which make Goodneed Of Fitness Test is nonsense.&lt;BR /&gt;Check &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt; blogs:&lt;BR /&gt;&lt;A href="https://blogs.sas.com/content/iml/2019/02/20/easier-calibration-plot-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2019/02/20/easier-calibration-plot-sas.html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://blogs.sas.com/content/iml/2018/05/14/calibration-plots-in-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2018/05/14/calibration-plots-in-sas.html&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://blogs.sas.com/content/iml/2020/11/23/decile-plots-in-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2020/11/23/decile-plots-in-sas.html&lt;/A&gt;</description>
      <pubDate>Thu, 04 Apr 2024 02:17:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922885#M45866</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-04-04T02:17:14Z</dc:date>
    </item>
    <item>
      <title>Re: Calibration of logit model with large sample size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922886#M45867</link>
      <description>I followed &lt;A href="https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html&lt;/A&gt; to do calibration plot to compare the observed probability and the estimated probability along the 45 degree diagonal line. With such a large sample size, would splitting the sample into a few smaller random samples make goodness of fit meaningful? Or stratify the sample into a few subsamples to estimate probability within each meaningful subsample.</description>
      <pubDate>Thu, 04 Apr 2024 03:00:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922886#M45867</guid>
      <dc:creator>lichee</dc:creator>
      <dc:date>2024-04-04T03:00:52Z</dc:date>
    </item>
    <item>
      <title>Re: Calibration of logit model with large sample size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922887#M45868</link>
      <description>I think &lt;BR /&gt; &lt;A href="https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2018/05/16/decile-calibration-plots-sas.html&lt;/A&gt; &lt;BR /&gt;is good enough, No need to split your data into many small sub-data.&lt;BR /&gt;Anyway, &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt; &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13633"&gt;@StatDave&lt;/a&gt; might have insight in it.</description>
      <pubDate>Thu, 04 Apr 2024 03:15:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922887#M45868</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-04-04T03:15:30Z</dc:date>
    </item>
    <item>
      <title>Re: Calibration of logit model with large sample size</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922897#M45869</link>
      <description>Since you have a big table, I would like to introduce PROC HPLOGISTIC .&lt;BR /&gt;Check "partition" statement and Hosmer-Lemeshow Test.</description>
      <pubDate>Thu, 04 Apr 2024 06:27:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Calibration-of-logit-model-with-large-sample-size/m-p/922897#M45869</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-04-04T06:27:24Z</dc:date>
    </item>
  </channel>
</rss>

