BookmarkSubscribeRSS Feed
andreas_zaras
Pyrite | Level 9

Hello,

I have a maximal tree with 29 nodes. I check the results window and more specifically at the leaf statistics chart. I am trying to figure out how to explain the chart and where would it be useful to. The charts shows 15 pairs of bars (blue and red) and beneath them the leaf index says: 1, 2, 4, 5, 7, 8, 9, 14, 15, 19, 21, 23, 25, 28, 29. As i can figure out the bar pairs are sorted by descending order of the blue bar. As i read the blue bar shows the percentage of the modelled outcome in the training data set and the red bar the percentage of the modelled outcome in the validation data set. As i understand from this each pair shows the predicted and the actual probability estimation of thge modelled outcome when validating the data with the corresponding data set. First of all why are the specific leaf index values are assifned to the corresponding pairs? What is the logic behond it? At the beginning i thought it was something like an increasing number but then values like 3, 6, 10, 11, 12, 13 etc are missing. WHy are the rest of the nodes do not appearin the graph? What would be the interpretation and the usefulness of this leaf chart?

Thnaks in advance,

Andreas  

1 REPLY 1
DougWielenga
SAS Employee

A decision tree is comprised of many nodes but not all nodes are 'terminal' in the sense that further splitting is done.  From a predictive model implementation standpoint, the key thing to identify is which terminal node each observation falls into.  It is of lesser interest (at time of scoring) what those predicted values were in nodes which were not split.  As a result, the Decision Tree will assign a Node ID to each node (whether it is a terminal node or not) and it will assign a different set of values to just the Terminal node.  In your case, you were likely looking at a graph plotted based on the numbering which included both terminal and non-terminal nodes which is why there were some with missing values.  In SAS Enteprise Miner 14.1, the plot is generated based on the Leaf variable which only numbers the terminal nodes.   

 

The graphs generated in SAS Enterprise Miner node results are provided to help visualize the results.  You can see the underlying detail which the graph was created from -- they are not stored as graphic files but created on the fly when the node results are opened -- by clicking on the desired graph and then clicking on View --> Table in the node results window.  This will show all of the detail available.  You can also right-click on the graph and choose Data Options... and modify which information from the table is plotted.  

 

I hope this helps!

Doug

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1896 views
  • 0 likes
  • 2 in conversation