BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MarianaL
Calcite | Level 5


I am using interactive decision trees (due to business requirements). I noticed that the SAS code for Node Rules in omitting some of my data records in the if-then clauses. What is the reason and how should I assign/classify the omitted records? Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
MarianaL
Calcite | Level 5

OK, I finally figured that out. In addition to all of the above settings, I put the Minimum Categorical Size =1 and now I have all values.

View solution in original post

3 REPLIES 3
M_Maldonado
Barite | Level 11


Hi Mariana,

It seems to me that you have a very recent version of Enterprise Miner, as we used to call these rules the "English rules". Now we use the more accurate term "Node rules".

If you open the results and you go to View->Model->Node Rules, you will only see the rules for the terminal leaves of this tree, this means, the nodes that have no further splits.
In the example below, the nodes 4, 5, and 6 are the terminal leaves of this tree. For this example, the node rules file only has the rules for these three nodes. Even if this file does not have the rules for node 3, you are not omitting any record, as the sum of the counts for the leaves is the total observations. In this example 547+1155+4258 are the total 5960 observations from the root node.

forMariana.jpg

You can still see the node rules for any node (including those that are not a terminal leaf). On interactive mode, or from the tree plot on the results, right click on Tools->Display node rule (or English rule).

Finally, notice that Node Rules is a pseudo-code, hence the old name "English" rules. They help you understand the rules for the nodes of the tree, but Node Rules file is not a piece of SAS code you can run... but close!

I hope it helps,

-Miguel

SAS Profile

MarianaL
Calcite | Level 5

Hi Migel,

Thank you for your answer. In my case, in order to understand what is happening, I am using only one level of depth, and I need to split around 10000 records into 10 groups by only one categorical (nominal) variable. The frequencies add up but not all of the categorical levels are included in the if-then clauses in the SAS Node Rules. I have been trying this for several different categorical variables, and I always have few missing levels. Since they are not interval variables, nor ordinal, they have to be grouped by target predicting probabilities. I am using Sample=None option and I tried with and without partitioning, and I always have missing levels. I think I may have to set some number of levels variable to use all of the levels somewhere. I have only 70 levels for this example. When creating the data, I did the customiztion and set Class Levels Count Threshold to 100, but it did not help.

MarianaL
Calcite | Level 5

OK, I finally figured that out. In addition to all of the above settings, I put the Minimum Categorical Size =1 and now I have all values.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1312 views
  • 1 like
  • 2 in conversation