<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How does SAS decides leaves in procHPSPLIT &amp;amp; does it not lead to overfitting if we let SAS d in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/How-does-SAS-decides-leaves-in-procHPSPLIT-amp-does-it-not-lead/m-p/583260#M75714</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/267862"&gt;@vikrantarora25&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would be grateful if some expert on the forum can help me understand how to decide optimum number of leaves in a decision tree analysis.&lt;/P&gt;&lt;P&gt;I am using SAS and if I supply leaves=6 in my model then miss-classification rates for validation &amp;amp; training data sets are 18.6% &amp;amp; 18.8% respectively. And SAS lists 5 variables which are significant.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And if I don't supply leaves count in the code and let SAS decide it, then SAS after pruning takes 10 as leaves count and miss-classification rates for validation &amp;amp; training data sets are 17.5% &amp;amp; 16.9% respectively. And SAS lists 6 variables which are significant.&lt;/P&gt;&lt;P&gt;Now that the miss-classification rates have reduced &amp;amp; trees after pruning have increased from 4 to 10, is it a good thing or it indicates overfitting?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking forward to opinions of experts in this group.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;&lt;P&gt;Vikrant&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;There's a subjectivity to model building. You need to consider the following questions:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) Are the variables in a given model likely to be related to the outcome? If you are doing exploratory modeling then you may not have a good idea about this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) What is a large misclassification rate? It depends on what you are trying to do. What does being wrong 1/5th of the time mean for your use of the model? Is that an acceptable misclassification rate?&amp;nbsp;No one can answer this for you. There may be some models where you can only put up with very small misclassification rates and others where the rates can be larger.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 22 Aug 2019 16:45:17 GMT</pubDate>
    <dc:creator>DWilson</dc:creator>
    <dc:date>2019-08-22T16:45:17Z</dc:date>
    <item>
      <title>How does SAS decides leaves in procHPSPLIT &amp; does it not lead to overfitting if we let SAS decide it</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-does-SAS-decides-leaves-in-procHPSPLIT-amp-does-it-not-lead/m-p/545808#M74326</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would be grateful if some expert on the forum can help me understand how to decide optimum number of leaves in a decision tree analysis.&lt;/P&gt;&lt;P&gt;I am using SAS and if I supply leaves=6 in my model then miss-classification rates for validation &amp;amp; training data sets are 18.6% &amp;amp; 18.8% respectively. And SAS lists 5 variables which are significant.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And if I don't supply leaves count in the code and let SAS decide it, then SAS after pruning takes 10 as leaves count and miss-classification rates for validation &amp;amp; training data sets are 17.5% &amp;amp; 16.9% respectively. And SAS lists 6 variables which are significant.&lt;/P&gt;&lt;P&gt;Now that the miss-classification rates have reduced &amp;amp; trees after pruning have increased from 4 to 10, is it a good thing or it indicates overfitting?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking forward to opinions of experts in this group.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;&lt;P&gt;Vikrant&lt;/P&gt;</description>
      <pubDate>Mon, 25 Mar 2019 13:45:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-does-SAS-decides-leaves-in-procHPSPLIT-amp-does-it-not-lead/m-p/545808#M74326</guid>
      <dc:creator>vikrantarora25</dc:creator>
      <dc:date>2019-03-25T13:45:29Z</dc:date>
    </item>
    <item>
      <title>Re: How does SAS decides leaves in procHPSPLIT &amp; does it not lead to overfitting if we let SAS d</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-does-SAS-decides-leaves-in-procHPSPLIT-amp-does-it-not-lead/m-p/583260#M75714</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/267862"&gt;@vikrantarora25&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would be grateful if some expert on the forum can help me understand how to decide optimum number of leaves in a decision tree analysis.&lt;/P&gt;&lt;P&gt;I am using SAS and if I supply leaves=6 in my model then miss-classification rates for validation &amp;amp; training data sets are 18.6% &amp;amp; 18.8% respectively. And SAS lists 5 variables which are significant.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And if I don't supply leaves count in the code and let SAS decide it, then SAS after pruning takes 10 as leaves count and miss-classification rates for validation &amp;amp; training data sets are 17.5% &amp;amp; 16.9% respectively. And SAS lists 6 variables which are significant.&lt;/P&gt;&lt;P&gt;Now that the miss-classification rates have reduced &amp;amp; trees after pruning have increased from 4 to 10, is it a good thing or it indicates overfitting?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking forward to opinions of experts in this group.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;&lt;P&gt;Vikrant&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;There's a subjectivity to model building. You need to consider the following questions:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) Are the variables in a given model likely to be related to the outcome? If you are doing exploratory modeling then you may not have a good idea about this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2) What is a large misclassification rate? It depends on what you are trying to do. What does being wrong 1/5th of the time mean for your use of the model? Is that an acceptable misclassification rate?&amp;nbsp;No one can answer this for you. There may be some models where you can only put up with very small misclassification rates and others where the rates can be larger.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Aug 2019 16:45:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-does-SAS-decides-leaves-in-procHPSPLIT-amp-does-it-not-lead/m-p/583260#M75714</guid>
      <dc:creator>DWilson</dc:creator>
      <dc:date>2019-08-22T16:45:17Z</dc:date>
    </item>
  </channel>
</rss>

