<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to approximate C4.5 algorithm in SAS EM 6.2 with the Decision Tree node? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-approximate-C4-5-algorithm-in-SAS-EM-6-2-with-the/m-p/388174#M5822</link>
    <description>&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;In short, it was a design decision to avoid having a specific setting for C4.5 several reasons.&amp;nbsp;&amp;nbsp;Tree peformance can only be hindered by limiting splitting on interval inputs to two-way splits.&amp;nbsp;&amp;nbsp;A paper in the early 1990s compared C4.5 using an interval input with C4.5 using that same input discretized into 10 or so values.&amp;nbsp;&amp;nbsp;Trees with the discretized variable were better because of the bias towards categorical (hence multi-way) splits.&amp;nbsp;&amp;nbsp; We also opted to exclude the main C4.5 splitting criterion, Gain Ratio, which is ENTROPY divided by another factor in an attempt to avoid generating too many branches from a categorical input.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regarding settings in Enterprise Miner:&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ Split search:&amp;nbsp;&amp;nbsp;EHAUSTIVE only kicks in when x and y both have more than 2 nominal categories.&amp;nbsp;&amp;nbsp;Then, C4.5 makes a multiway split. Initially one branch for each x value, and then merges the branches using GainRatio.&amp;nbsp;&amp;nbsp; Reducing the number of branches is different with the C4.5 Gain Ratio than with the CHAID approach.&amp;nbsp;&amp;nbsp;I don't think setting EXHAUSTIVE to anything special will help the comparison.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ Node Sample:&amp;nbsp;&amp;nbsp;PERFORMANCE NODESAMPLE=ALL;&amp;nbsp;&amp;nbsp; We are strongly considering getting rid of the NODESAMPLE option at some point in the future.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ Subtree:&amp;nbsp;&amp;nbsp;Use best assessed subtree for ASE or Misclassification.&amp;nbsp;&amp;nbsp;The C4.5 author calls this 'Error based pruning'.&amp;nbsp;&amp;nbsp;I believe the C4.5 default is 'pessimistic pruning' which SAS does not offer.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ P-values adjustment: C4.5 does not use a criterion with p-values.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In short, there are several areas of concern for the C4.5 approach which is why this approach is not fully represented in SAS Enterprise Miner.&amp;nbsp;&amp;nbsp;I hope this is helpful.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cordially,&lt;BR /&gt;Doug&lt;/P&gt;</description>
    <pubDate>Tue, 15 Aug 2017 14:50:57 GMT</pubDate>
    <dc:creator>DougWielenga</dc:creator>
    <dc:date>2017-08-15T14:50:57Z</dc:date>
    <item>
      <title>How to approximate C4.5 algorithm in SAS EM 6.2 with the Decision Tree node?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-approximate-C4-5-algorithm-in-SAS-EM-6-2-with-the/m-p/89181#M622</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In SAS Enterprise Miner Help I've found how to approximate CHAID and CART methods using Decision Tree node, but there is nothing about C4.5 algorithm.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can I simulate C4.5 algorithm using Decision Tree node?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would be grateful for any help.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 09 Aug 2012 19:30:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-approximate-C4-5-algorithm-in-SAS-EM-6-2-with-the/m-p/89181#M622</guid>
      <dc:creator>PiratDrogowy</dc:creator>
      <dc:date>2012-08-09T19:30:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to approximate C4.5 algorithm in SAS EM 6.2 with the Decision Tree node?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-approximate-C4-5-algorithm-in-SAS-EM-6-2-with-the/m-p/388174#M5822</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;In short, it was a design decision to avoid having a specific setting for C4.5 several reasons.&amp;nbsp;&amp;nbsp;Tree peformance can only be hindered by limiting splitting on interval inputs to two-way splits.&amp;nbsp;&amp;nbsp;A paper in the early 1990s compared C4.5 using an interval input with C4.5 using that same input discretized into 10 or so values.&amp;nbsp;&amp;nbsp;Trees with the discretized variable were better because of the bias towards categorical (hence multi-way) splits.&amp;nbsp;&amp;nbsp; We also opted to exclude the main C4.5 splitting criterion, Gain Ratio, which is ENTROPY divided by another factor in an attempt to avoid generating too many branches from a categorical input.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regarding settings in Enterprise Miner:&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ Split search:&amp;nbsp;&amp;nbsp;EHAUSTIVE only kicks in when x and y both have more than 2 nominal categories.&amp;nbsp;&amp;nbsp;Then, C4.5 makes a multiway split. Initially one branch for each x value, and then merges the branches using GainRatio.&amp;nbsp;&amp;nbsp; Reducing the number of branches is different with the C4.5 Gain Ratio than with the CHAID approach.&amp;nbsp;&amp;nbsp;I don't think setting EXHAUSTIVE to anything special will help the comparison.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ Node Sample:&amp;nbsp;&amp;nbsp;PERFORMANCE NODESAMPLE=ALL;&amp;nbsp;&amp;nbsp; We are strongly considering getting rid of the NODESAMPLE option at some point in the future.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ Subtree:&amp;nbsp;&amp;nbsp;Use best assessed subtree for ASE or Misclassification.&amp;nbsp;&amp;nbsp;The C4.5 author calls this 'Error based pruning'.&amp;nbsp;&amp;nbsp;I believe the C4.5 default is 'pessimistic pruning' which SAS does not offer.&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;+ P-values adjustment: C4.5 does not use a criterion with p-values.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In short, there are several areas of concern for the C4.5 approach which is why this approach is not fully represented in SAS Enterprise Miner.&amp;nbsp;&amp;nbsp;I hope this is helpful.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cordially,&lt;BR /&gt;Doug&lt;/P&gt;</description>
      <pubDate>Tue, 15 Aug 2017 14:50:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-approximate-C4-5-algorithm-in-SAS-EM-6-2-with-the/m-p/388174#M5822</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2017-08-15T14:50:57Z</dc:date>
    </item>
  </channel>
</rss>

