<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Missings-Showing-Up-in-the-Strangest-of-Places-Decision-Trees/m-p/189934#M2364</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;We are building a very basic Decision Tree - love EM too! We have the four following nodes:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Input Data,&lt;/P&gt;&lt;P&gt;Data Partition,&lt;/P&gt;&lt;P&gt;Consolidation Tree, then&lt;/P&gt;&lt;P&gt;Decision Tree.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The Consolidation Tree is actually a variation of a Decision Tree where we are taking some variables with many nominal categories (sometimes in the hundreds) and seeing we can relate them into simplified groupings to our main dependent/target variable. Below is is a snapshot of part of our Consolidation Tree:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="Capture.PNG" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/7649_Capture.PNG" width="450" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The origin node looks great and we split our data 80/20 between training/validation. The first significant level is called NAC_CODE and it grouped the variable into two nice nodes. But the next level down for one of the nodes splits GOVERNING_CLASS into two nodes again - problem is one of them is a node for Missing_Values_Only. I normally would not be too concerned as many of the variables within our dataset have missing values. But GOVERNING_CLASS has zero. I fully understand how EM automatically groups the missing with other values of response for varying nodes even when there might be none in the present dataset for scoring purposes, but this does not make sense at all to be by itself.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help. I have some other questions coming after this one is resolved as well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you very much.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 10pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;Zach Feinstein, Statistical Data Modeler&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; &lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 10pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;P&lt;/STRONG&gt;&lt;SPAN style="font-size: 10pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-no-proof: yes;"&gt; (952) 838-4289 &lt;STRONG&gt; &lt;SPAN style="color: #1f497d;"&gt;C &lt;/SPAN&gt;&lt;/STRONG&gt;(612) 590-4813&amp;nbsp; &lt;STRONG style="color: #1f497d;"&gt;F&lt;/STRONG&gt; (952) 838-2010&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 8pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;SFM Mutual Insurance Company&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 8pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-no-proof: yes;"&gt;3500 American Blvd. W,&lt;BR /&gt;Suite 700, Bloomington, MN 55431&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 17 Oct 2014 18:15:22 GMT</pubDate>
    <dc:creator>Zachary</dc:creator>
    <dc:date>2014-10-17T18:15:22Z</dc:date>
    <item>
      <title>Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missings-Showing-Up-in-the-Strangest-of-Places-Decision-Trees/m-p/189934#M2364</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;We are building a very basic Decision Tree - love EM too! We have the four following nodes:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Input Data,&lt;/P&gt;&lt;P&gt;Data Partition,&lt;/P&gt;&lt;P&gt;Consolidation Tree, then&lt;/P&gt;&lt;P&gt;Decision Tree.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The Consolidation Tree is actually a variation of a Decision Tree where we are taking some variables with many nominal categories (sometimes in the hundreds) and seeing we can relate them into simplified groupings to our main dependent/target variable. Below is is a snapshot of part of our Consolidation Tree:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="Capture.PNG" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/7649_Capture.PNG" width="450" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The origin node looks great and we split our data 80/20 between training/validation. The first significant level is called NAC_CODE and it grouped the variable into two nice nodes. But the next level down for one of the nodes splits GOVERNING_CLASS into two nodes again - problem is one of them is a node for Missing_Values_Only. I normally would not be too concerned as many of the variables within our dataset have missing values. But GOVERNING_CLASS has zero. I fully understand how EM automatically groups the missing with other values of response for varying nodes even when there might be none in the present dataset for scoring purposes, but this does not make sense at all to be by itself.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help. I have some other questions coming after this one is resolved as well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you very much.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 10pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;Zach Feinstein, Statistical Data Modeler&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; &lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 10pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;P&lt;/STRONG&gt;&lt;SPAN style="font-size: 10pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-no-proof: yes;"&gt; (952) 838-4289 &lt;STRONG&gt; &lt;SPAN style="color: #1f497d;"&gt;C &lt;/SPAN&gt;&lt;/STRONG&gt;(612) 590-4813&amp;nbsp; &lt;STRONG style="color: #1f497d;"&gt;F&lt;/STRONG&gt; (952) 838-2010&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 8pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;SFM Mutual Insurance Company&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 8pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-no-proof: yes;"&gt;3500 American Blvd. W,&lt;BR /&gt;Suite 700, Bloomington, MN 55431&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 Oct 2014 18:15:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missings-Showing-Up-in-the-Strangest-of-Places-Decision-Trees/m-p/189934#M2364</guid>
      <dc:creator>Zachary</dc:creator>
      <dc:date>2014-10-17T18:15:22Z</dc:date>
    </item>
    <item>
      <title>Re: Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missings-Showing-Up-in-the-Strangest-of-Places-Decision-Trees/m-p/189935#M2365</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I think what is happening here is that categories with less than the value specified for the Decision Tree node property &lt;STRONG&gt;Minimum Categorical Size&lt;/STRONG&gt; are treated as missing, so that's why you are seeing that branch for GOVERNING_CLASS even though it has no missing values.&amp;nbsp; So one option is to change (lower) that value so categories with extremely small numbers are not treated as missing.&amp;nbsp; The second thing you can change is the &lt;STRONG&gt;Missing Values&lt;/STRONG&gt; property to something other than &lt;STRONG&gt;Use in search&lt;/STRONG&gt;.&amp;nbsp; This will prevent a branch from ever having only missing values (true missings and those defined by Min Cat Size).&amp;nbsp; Hope that helps!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 Oct 2014 18:33:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missings-Showing-Up-in-the-Strangest-of-Places-Decision-Trees/m-p/189935#M2365</guid>
      <dc:creator>WendyCzika</dc:creator>
      <dc:date>2014-10-17T18:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Missings-Showing-Up-in-the-Strangest-of-Places-Decision-Trees/m-p/189936#M2366</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Welcome to the community, Zach! I hope you find some good advice in this forum. Keep the questions coming!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anna&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 Oct 2014 19:21:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Missings-Showing-Up-in-the-Strangest-of-Places-Decision-Trees/m-p/189936#M2366</guid>
      <dc:creator>AnnaBrown</dc:creator>
      <dc:date>2014-10-17T19:21:44Z</dc:date>
    </item>
  </channel>
</rss>

