04-02-2014 03:50 PM
I am using interactive decision trees (due to business requirements). I noticed that the SAS code for Node Rules in omitting some of my data records in the if-then clauses. What is the reason and how should I assign/classify the omitted records? Thank you.
04-02-2014 05:14 PM
It seems to me that you have a very recent version of Enterprise Miner, as we used to call these rules the "English rules". Now we use the more accurate term "Node rules".
If you open the results and you go to View->Model->Node Rules, you will only see the rules for the terminal leaves of this tree, this means, the nodes that have no further splits.
In the example below, the nodes 4, 5, and 6 are the terminal leaves of this tree. For this example, the node rules file only has the rules for these three nodes. Even if this file does not have the rules for node 3, you are not omitting any record, as the sum of the counts for the leaves is the total observations. In this example 547+1155+4258 are the total 5960 observations from the root node.
You can still see the node rules for any node (including those that are not a terminal leaf). On interactive mode, or from the tree plot on the results, right click on Tools->Display node rule (or English rule).
Finally, notice that Node Rules is a pseudo-code, hence the old name "English" rules. They help you understand the rules for the nodes of the tree, but Node Rules file is not a piece of SAS code you can run... but close!
I hope it helps,
04-02-2014 06:06 PM
Thank you for your answer. In my case, in order to understand what is happening, I am using only one level of depth, and I need to split around 10000 records into 10 groups by only one categorical (nominal) variable. The frequencies add up but not all of the categorical levels are included in the if-then clauses in the SAS Node Rules. I have been trying this for several different categorical variables, and I always have few missing levels. Since they are not interval variables, nor ordinal, they have to be grouped by target predicting probabilities. I am using Sample=None option and I tried with and without partitioning, and I always have missing levels. I think I may have to set some number of levels variable to use all of the levels somewhere. I have only 70 levels for this example. When creating the data, I did the customiztion and set Class Levels Count Threshold to 100, but it did not help.
Need further help from the community? Please ask a new question.