Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

SAS EM Node Rules code is omitting data records

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

SAS EM Node Rules code is omitting data records


I am using interactive decision trees (due to business requirements). I noticed that the SAS code for Node Rules in omitting some of my data records in the if-then clauses. What is the reason and how should I assign/classify the omitted records? Thank you.


Accepted Solutions
Solution
‎04-02-2014 07:15 PM
Occasional Contributor
Posts: 5

Re: SAS EM Node Rules code is omitting data records

OK, I finally figured that out. In addition to all of the above settings, I put the Minimum Categorical Size =1 and now I have all values.

View solution in original post


All Replies
Super Contributor
Posts: 337

Re: SAS EM Node Rules code is omitting data records


Hi Mariana,

It seems to me that you have a very recent version of Enterprise Miner, as we used to call these rules the "English rules". Now we use the more accurate term "Node rules".

If you open the results and you go to View->Model->Node Rules, you will only see the rules for the terminal leaves of this tree, this means, the nodes that have no further splits.
In the example below, the nodes 4, 5, and 6 are the terminal leaves of this tree. For this example, the node rules file only has the rules for these three nodes. Even if this file does not have the rules for node 3, you are not omitting any record, as the sum of the counts for the leaves is the total observations. In this example 547+1155+4258 are the total 5960 observations from the root node.

forMariana.jpg

You can still see the node rules for any node (including those that are not a terminal leaf). On interactive mode, or from the tree plot on the results, right click on Tools->Display node rule (or English rule).

Finally, notice that Node Rules is a pseudo-code, hence the old name "English" rules. They help you understand the rules for the nodes of the tree, but Node Rules file is not a piece of SAS code you can run... but close!

I hope it helps,

-Miguel

SAS Profile

Occasional Contributor
Posts: 5

Re: SAS EM Node Rules code is omitting data records

Posted in reply to M_Maldonado

Hi Migel,

Thank you for your answer. In my case, in order to understand what is happening, I am using only one level of depth, and I need to split around 10000 records into 10 groups by only one categorical (nominal) variable. The frequencies add up but not all of the categorical levels are included in the if-then clauses in the SAS Node Rules. I have been trying this for several different categorical variables, and I always have few missing levels. Since they are not interval variables, nor ordinal, they have to be grouped by target predicting probabilities. I am using Sample=None option and I tried with and without partitioning, and I always have missing levels. I think I may have to set some number of levels variable to use all of the levels somewhere. I have only 70 levels for this example. When creating the data, I did the customiztion and set Class Levels Count Threshold to 100, but it did not help.

Solution
‎04-02-2014 07:15 PM
Occasional Contributor
Posts: 5

Re: SAS EM Node Rules code is omitting data records

OK, I finally figured that out. In addition to all of the above settings, I put the Minimum Categorical Size =1 and now I have all values.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 601 views
  • 1 like
  • 2 in conversation