<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: 500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199712#M2634</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A href="http://gking.harvard.edu/files/gking/files/0s.pdf" title="http://gking.harvard.edu/files/gking/files/0s.pdf"&gt;http://gking.harvard.edu/files/gking/files/0s.pdf&lt;/A&gt;&amp;nbsp; This seems to be key paper referred to by Paul Allen.&lt;/P&gt;&lt;P&gt;He was right...it is subject to mis-interpretation...as are all academic stats papers when used to answer sampling bias questions.&lt;/P&gt;&lt;P&gt;Graphic Visualizations certainly can help as can ad-hoc help systems in software.&lt;/P&gt;&lt;P&gt;How can JMP visualize the bias issues and also define the sampling complexity scenario in simpler terms perhaps with case examples?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 09 Jul 2015 18:31:48 GMT</pubDate>
    <dc:creator>Mclayton200</dc:creator>
    <dc:date>2015-07-09T18:31:48Z</dc:date>
    <item>
      <title>500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199708#M2630</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Hi, &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;I would like to build a logistic regression model but I only have 500 responders and 3 Millions non responders! I have always been told that we need at least 1000 responders to have a decent model? Is there any solution for this? Will generating number of randoms samples based on 500 responders&amp;nbsp; and add them into the modelelling dataset to get to 1000 responders work? (similar to boot-strapping sampling)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt; •&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Say create random 20% samples (i.e. select 20% of the universe) with the conversion rate of 0.02% I will need about 10 random samples to get to the response count of 1000&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt; •&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; For each sample I will need to create different dummy address ids for the responders – e.g. actual address_id||1 (for the 1st sample), and then address_id||2(for the 2nd sample) &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;•&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; That way I will get a pool of about 1000 responders &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;•&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; And then randomly select 1000 non-responders.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Your help would be much appreciated.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Many Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Jul 2015 14:59:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199708#M2630</guid>
      <dc:creator>Kanyange</dc:creator>
      <dc:date>2015-07-09T14:59:53Z</dc:date>
    </item>
    <item>
      <title>Re: 500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199709#M2631</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;See some of the Suggested Answers in the MORE LIKE THIS section on the right hand side of your question. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There's a rule of thumb for responders to number of variables, bayesian estimates to resample, simulation methods that are options.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PS. I've never heard of the rule of 1000 responders in 10 years of stats, so it may be specific to your field perhaps?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Jul 2015 16:11:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199709#M2631</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2015-07-09T16:11:56Z</dc:date>
    </item>
    <item>
      <title>Re: 500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199710#M2632</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I've always performed logistic regression based on the response rate and not the total number of responses.&amp;nbsp; For instance, if you have a data set of 100,000 observations and of that 10,000 have a response (i.e. 10% response rate) you're just fine.&amp;nbsp; However, if you have a data set of 100,000 observations and only 100 have a response (i.e. 0.1% response rate) then you have a problem using regular logistic regression because of the issue with the maximum likelihood estimation suffering a degree of bias. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is a good article from Paul Allen, who I really like and own a couple of his books with logistic regression, that discusses techniques to be used in this case.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://statisticalhorizons.com/logistic-regression-for-rare-events"&gt;http://statisticalhorizons.com/logistic-regression-for-rare-events&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good Luck!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Jul 2015 16:34:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199710#M2632</guid>
      <dc:creator>dcruik</dc:creator>
      <dc:date>2015-07-09T16:34:33Z</dc:date>
    </item>
    <item>
      <title>Re: 500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199711#M2633</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Paul Allison - good reference for stats and SAS &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Jul 2015 16:45:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199711#M2633</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2015-07-09T16:45:59Z</dc:date>
    </item>
    <item>
      <title>Re: 500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199712#M2634</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A href="http://gking.harvard.edu/files/gking/files/0s.pdf" title="http://gking.harvard.edu/files/gking/files/0s.pdf"&gt;http://gking.harvard.edu/files/gking/files/0s.pdf&lt;/A&gt;&amp;nbsp; This seems to be key paper referred to by Paul Allen.&lt;/P&gt;&lt;P&gt;He was right...it is subject to mis-interpretation...as are all academic stats papers when used to answer sampling bias questions.&lt;/P&gt;&lt;P&gt;Graphic Visualizations certainly can help as can ad-hoc help systems in software.&lt;/P&gt;&lt;P&gt;How can JMP visualize the bias issues and also define the sampling complexity scenario in simpler terms perhaps with case examples?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Jul 2015 18:31:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199712#M2634</guid>
      <dc:creator>Mclayton200</dc:creator>
      <dc:date>2015-07-09T18:31:48Z</dc:date>
    </item>
    <item>
      <title>Re: 500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199713#M2635</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would suggest to do oversampling (select all the responsders and part of non responders) and later on correct for bias due to oversampling&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 10 Jul 2015 09:03:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199713#M2635</guid>
      <dc:creator>Vikesh</dc:creator>
      <dc:date>2015-07-10T09:03:51Z</dc:date>
    </item>
    <item>
      <title>Re: 500 Responders only? Is it sufficient to build a Logistic Regression model? Thank you</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199714#M2636</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;OU. That is a good reason to use Possion Regression.&lt;/P&gt;&lt;P&gt;Take a look at Logistic Link function:&lt;/P&gt;&lt;P&gt;log(p/(1-p)) , if p ~ 0 then ==&amp;gt; log(p) , it is exactly the Possion Regression's Link function .&lt;/P&gt;&lt;P&gt;Or use negative binomial distribution .&lt;/P&gt;&lt;P&gt;Check proc genmod ,you can use both of these distribution .&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;BTW, There is a EXACT statement in proc logistic, you can use it for small sample data .&lt;/P&gt;&lt;P&gt;Also consider using Montal Carlo method ,which is also valuable in proc logistic.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Xia Keshan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 10 Jul 2015 12:05:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/500-Responders-only-Is-it-sufficient-to-build-a-Logistic/m-p/199714#M2636</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2015-07-10T12:05:34Z</dc:date>
    </item>
  </channel>
</rss>

