<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Appropriate to use firth method in proc logistic for rare events? in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Appropriate-to-use-firth-method-in-proc-logistic-for-rare-events/m-p/102940#M25059</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to perform logistic regression but am facing rare events (~0.07%) out of a total sample of 200,000+ observations. I understand that one method is to perform stratified sampling. But I also read that Firth method is possible too? (&lt;A href="http://www.statisticalhorizons.com/logistic-regression-for-rare-events" title="http://www.statisticalhorizons.com/logistic-regression-for-rare-events"&gt;Logistic Regression for Rare Events | Statistical Horizons&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can I check if Firth method is appropriate for rare events?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 08 Feb 2013 04:26:51 GMT</pubDate>
    <dc:creator>lavernal</dc:creator>
    <dc:date>2013-02-08T04:26:51Z</dc:date>
    <item>
      <title>Appropriate to use firth method in proc logistic for rare events?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Appropriate-to-use-firth-method-in-proc-logistic-for-rare-events/m-p/102940#M25059</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to perform logistic regression but am facing rare events (~0.07%) out of a total sample of 200,000+ observations. I understand that one method is to perform stratified sampling. But I also read that Firth method is possible too? (&lt;A href="http://www.statisticalhorizons.com/logistic-regression-for-rare-events" title="http://www.statisticalhorizons.com/logistic-regression-for-rare-events"&gt;Logistic Regression for Rare Events | Statistical Horizons&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can I check if Firth method is appropriate for rare events?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 08 Feb 2013 04:26:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Appropriate-to-use-firth-method-in-proc-logistic-for-rare-events/m-p/102940#M25059</guid>
      <dc:creator>lavernal</dc:creator>
      <dc:date>2013-02-08T04:26:51Z</dc:date>
    </item>
    <item>
      <title>Re: Appropriate to use firth method in proc logistic for rare events?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Appropriate-to-use-firth-method-in-proc-logistic-for-rare-events/m-p/102941#M25060</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You might want to check out the paper by King and Zeng, "Logistic Regression in Rare Events Data" that addresses the rare events problem and also cites Firth's paper. I am interested in knowing how you have progressed with the modeling of the rare data, as I have a similar extremely rare events data to process.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt; &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 08 Apr 2014 13:58:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Appropriate-to-use-firth-method-in-proc-logistic-for-rare-events/m-p/102941#M25060</guid>
      <dc:creator>ranjan_mitre_org</dc:creator>
      <dc:date>2014-04-08T13:58:18Z</dc:date>
    </item>
    <item>
      <title>Re: Appropriate to use firth method in proc logistic for rare events?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Appropriate-to-use-firth-method-in-proc-logistic-for-rare-events/m-p/481716#M25061</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Let me explain my situation :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) I have a dataset - where the response rate is 0.6% (374 events in a total of 61279 records) and I need to build a logistic regression model on this dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) Option 1 : I can go with PROC LOGISTIC (conventional Maximum Likelihood) as the thumb rule " that you should have at least 10 events for each parameter estimated" should hold good considering that I start my model build iteration with not more than 35 variables and finalize the model build with less than 10 variables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please do let me know if I have more than 35 predictors initially to start the model build process and if it is recommended to use PROC LOGISTIC (conventional ML) with the understanding that I may have to do certain categorical level collapses to rule out cases of quasi complete separation/ complete separation and considering the thumb rule " that you should have at least 10 events for each parameter estimated" ?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3) Option -2 : I can go with PROC LOGISTIC (Firth's Method using Penalized Likelihood) - The Firth method could be helpful in reducing any small-sample bias of the estimators.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Please do let me know if I have more than 35 predictors initially to start the model build process and if it is recommended to use PROC LOGISTIC (Firth's Method using Penalized Likelihood) with the understanding that I DO NOT have to do any categorical level collapses to rule out cases of quasi complete separation/ complete separation ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4) Option -3 : If the above 2 options is not recommended , then the last option is to use the strategy for Over sampling of rare events. As the total number of events-374 and total records-61279 are both quite less with regards to posing any challenges on computing time or on hardware, I would obviously go with a oversampling rate of 5% only (Number of records to be modelled=6487) as I want to consider as many non-event records as possible as if I go for oversampling rate above 5% , the total number of records that can be modeled is less than 6487 .&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;My thoughts on Option-1,Option-2 or Option-3 as given below :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-- With a 5.77 oversampled rate, number of events = 374 and number of non-events=6113, a total of 6487 records. with a 70:30 split between TRAIN and VALIDATION , I can build my model on 4541 records and perform intime validation on 1946 records.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-- Comparing to Option-1 and Option-2, with a 70:30 split between TRAIN and VALIDATION , I can build my model on 42896 records and perform intime validation on 18383 records.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regarding Option-1,Option-2 or Option-3 , Please do help me with which option is recommended for me - Option-1,Option-2 or Option-3 in my case ? If Option-3, then is it recommended to use a oversampling rate of either 2% or 3% in order to increase the number of records to be modeled to something above 6487 ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Surajit&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jul 2018 23:04:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Appropriate-to-use-firth-method-in-proc-logistic-for-rare-events/m-p/481716#M25061</guid>
      <dc:creator>SAS_VA_Learner</dc:creator>
      <dc:date>2018-07-26T23:04:04Z</dc:date>
    </item>
  </channel>
</rss>

