<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Please Help.What's the minimum number of responses required to build a model? Thank You in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Please-Help-What-s-the-minimum-number-of-responses-required-to/m-p/388535#M5853</link>
    <description>&lt;P&gt;The short answer is that "your mileage may vary" depending on your analytical needs and business objectives. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The minimum number of responses needed to build a model depends on the modeling approach. &amp;nbsp;For instance, it is commonly said but rarely ever discussed in textbooks that for ordinary least squares regression models, you want at least 5 observations for each model parameter. &amp;nbsp; You usually estimate the intercept (1 parameter) and then add one for each of the interval input variables (say, J parameters) and add k-1 parameters for each of your categorical input variables where k represents the number of levels for a particular categorical variable plus more if you want to consider any interactions or higher order terms. &amp;nbsp; For neural network models, you might be better off having at least 15-20 observations for each parameter but there are far more parameters in a corresponding neural network model. &amp;nbsp;Decision Trees do not have 'parameters' so it is not really possible to say.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the end, you can consider the following:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* data mining problems typically have a large number of observations&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* when you have a relatively small number of observations, you have to consider more simple models&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* the predictive capability of those models with few observations will likely be less than that of a model computed on a larger sample from a population&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* data is expensive and obtaining more data (let alone a great deal more data) is often not feasible or practical&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* data based decisions are generally better than purely perception based decisions since the data improves your understanding about what is happening&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * different modeling methods have different requirements&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * the modeling methods will typically return errors or clearly problematic results when there are too few observations&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * this often happens when there are a small number of events of interest for a categorical target&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * your confidence in your conclusions should be lower when you have relatively few observations&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * the accuracy of the prediction and the stability of the relationship being modeled must be considered in assessing the strength of your conclusions&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In many cases people are modeling rare event scenarios. &amp;nbsp;You will likely learn from experience how strong your conclusions can be for a given sample of data. &amp;nbsp;I spoke with a direct marketing company that only needed a 2% response rate and didn't have much confidence in their models unless they had at least 5,000 respondents. &amp;nbsp;You can't use this number directly because you are probably considering different model requirements which impacts model complexity and all but certainly a different analysis problem. &amp;nbsp;Even if it is a similar problem in the same general area, you are likely analyzing data for a different company. &amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Aug 2017 17:11:47 GMT</pubDate>
    <dc:creator>DougWielenga</dc:creator>
    <dc:date>2017-08-16T17:11:47Z</dc:date>
    <item>
      <title>Please Help.What's the minimum number of responses required to build a model? Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Please-Help-What-s-the-minimum-number-of-responses-required-to/m-p/121838#M1030</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Hi,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;What's the minimun number of responses is required to build a descent model. In terms of volume.....For example, my contacted people is 4,000 and the responses (Yes) are 700.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Is it enough to build a model?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Many Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #0000ff;"&gt;Alice&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 26 Apr 2013 14:31:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Please-Help-What-s-the-minimum-number-of-responses-required-to/m-p/121838#M1030</guid>
      <dc:creator>Question</dc:creator>
      <dc:date>2013-04-26T14:31:01Z</dc:date>
    </item>
    <item>
      <title>Re: Please Help.What's the minimum number of responses required to build a model? Thank You</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Please-Help-What-s-the-minimum-number-of-responses-required-to/m-p/388535#M5853</link>
      <description>&lt;P&gt;The short answer is that "your mileage may vary" depending on your analytical needs and business objectives. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The minimum number of responses needed to build a model depends on the modeling approach. &amp;nbsp;For instance, it is commonly said but rarely ever discussed in textbooks that for ordinary least squares regression models, you want at least 5 observations for each model parameter. &amp;nbsp; You usually estimate the intercept (1 parameter) and then add one for each of the interval input variables (say, J parameters) and add k-1 parameters for each of your categorical input variables where k represents the number of levels for a particular categorical variable plus more if you want to consider any interactions or higher order terms. &amp;nbsp; For neural network models, you might be better off having at least 15-20 observations for each parameter but there are far more parameters in a corresponding neural network model. &amp;nbsp;Decision Trees do not have 'parameters' so it is not really possible to say.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the end, you can consider the following:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* data mining problems typically have a large number of observations&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* when you have a relatively small number of observations, you have to consider more simple models&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* the predictive capability of those models with few observations will likely be less than that of a model computed on a larger sample from a population&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* data is expensive and obtaining more data (let alone a great deal more data) is often not feasible or practical&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;* data based decisions are generally better than purely perception based decisions since the data improves your understanding about what is happening&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * different modeling methods have different requirements&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * the modeling methods will typically return errors or clearly problematic results when there are too few observations&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * this often happens when there are a small number of events of interest for a categorical target&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * your confidence in your conclusions should be lower when you have relatively few observations&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; * the accuracy of the prediction and the stability of the relationship being modeled must be considered in assessing the strength of your conclusions&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In many cases people are modeling rare event scenarios. &amp;nbsp;You will likely learn from experience how strong your conclusions can be for a given sample of data. &amp;nbsp;I spoke with a direct marketing company that only needed a 2% response rate and didn't have much confidence in their models unless they had at least 5,000 respondents. &amp;nbsp;You can't use this number directly because you are probably considering different model requirements which impacts model complexity and all but certainly a different analysis problem. &amp;nbsp;Even if it is a similar problem in the same general area, you are likely analyzing data for a different company. &amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope this helps!&lt;/P&gt;
&lt;P&gt;Doug&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 17:11:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Please-Help-What-s-the-minimum-number-of-responses-required-to/m-p/388535#M5853</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2017-08-16T17:11:47Z</dc:date>
    </item>
  </channel>
</rss>

