<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Random Forests: Difference between OOB and Validation? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Random-Forests-Difference-between-OOB-and-Validation/m-p/373367#M5545</link>
    <description>&lt;P&gt;Re: &amp;nbsp;more value from my data by having one larger training dataset than two separate training and validation datasets.&lt;/P&gt;
&lt;P&gt;Yes, in the common situation where more training data is useful. &amp;nbsp; Leo Breiman agreed. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Reasons to use validation data despite this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. In practice, OOB error rates are often biased on the conservative side. &amp;nbsp; Error rates decrease with the number trees, at least initially. &amp;nbsp; OOB error rates are based on about 1/3 of the trees for a specific data observation. &amp;nbsp;Consequently, if the forest has 100 trees then the OOB error rate is closer to the test data error rate on 33 trees than on 100 trees.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. OOB estimates are not directly comparible to other algorithms that use validation estimates. &amp;nbsp;So how would one confirm that a forest is better than a neural network unless the forest is applied to the same validation data? &amp;nbsp;Using the test set to select a model is risky if the data is easily overfit.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;-Padraic&lt;/P&gt;</description>
    <pubDate>Wed, 05 Jul 2017 17:27:10 GMT</pubDate>
    <dc:creator>PadraicGNeville</dc:creator>
    <dc:date>2017-07-05T17:27:10Z</dc:date>
    <item>
      <title>Random Forests: Difference between OOB and Validation?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Random-Forests-Difference-between-OOB-and-Validation/m-p/372288#M5535</link>
      <description>&lt;P&gt;Hello- RF is kind of different&amp;nbsp;than many other ML algorithms in that OOB is really a type of validation.&amp;nbsp;So then the question is, exactly what value does the validation data have? My thinking is that I would get more value from my data by having one&amp;nbsp;larger training dataset than two separate training and validation datasets. Of course, regardless, I would still have my test dataset. Thoughts?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks, -Ted&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jun 2017 19:42:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Random-Forests-Difference-between-OOB-and-Validation/m-p/372288#M5535</guid>
      <dc:creator>zzzzz</dc:creator>
      <dc:date>2017-06-30T19:42:39Z</dc:date>
    </item>
    <item>
      <title>Re: Random Forests: Difference between OOB and Validation?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Random-Forests-Difference-between-OOB-and-Validation/m-p/373367#M5545</link>
      <description>&lt;P&gt;Re: &amp;nbsp;more value from my data by having one larger training dataset than two separate training and validation datasets.&lt;/P&gt;
&lt;P&gt;Yes, in the common situation where more training data is useful. &amp;nbsp; Leo Breiman agreed. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Reasons to use validation data despite this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. In practice, OOB error rates are often biased on the conservative side. &amp;nbsp; Error rates decrease with the number trees, at least initially. &amp;nbsp; OOB error rates are based on about 1/3 of the trees for a specific data observation. &amp;nbsp;Consequently, if the forest has 100 trees then the OOB error rate is closer to the test data error rate on 33 trees than on 100 trees.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. OOB estimates are not directly comparible to other algorithms that use validation estimates. &amp;nbsp;So how would one confirm that a forest is better than a neural network unless the forest is applied to the same validation data? &amp;nbsp;Using the test set to select a model is risky if the data is easily overfit.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;-Padraic&lt;/P&gt;</description>
      <pubDate>Wed, 05 Jul 2017 17:27:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Random-Forests-Difference-between-OOB-and-Validation/m-p/373367#M5545</guid>
      <dc:creator>PadraicGNeville</dc:creator>
      <dc:date>2017-07-05T17:27:10Z</dc:date>
    </item>
  </channel>
</rss>

