turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Question about: Tree Node - Results Window - Leaf ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-11-2012 03:28 PM

Hello,

I have a maximal tree with 29 nodes. I check the results window and more specifically at the leaf statistics chart. I am trying to figure out how to explain the chart and where would it be useful to. The charts shows 15 pairs of bars (blue and red) and beneath them the leaf index says: 1, 2, 4, 5, 7, 8, 9, 14, 15, 19, 21, 23, 25, 28, 29. As i can figure out the bar pairs are sorted by descending order of the blue bar. As i read the blue bar shows the percentage of the modelled outcome in the training data set and the red bar the percentage of the modelled outcome in the validation data set. As i understand from this each pair shows the predicted and the actual probability estimation of thge modelled outcome when validating the data with the corresponding data set. First of all why are the specific leaf index values are assifned to the corresponding pairs? What is the logic behond it? At the beginning i thought it was something like an increasing number but then values like 3, 6, 10, 11, 12, 13 etc are missing. WHy are the rest of the nodes do not appearin the graph? What would be the interpretation and the usefulness of this leaf chart?

Thnaks in advance,

Andreas

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Tuesday

A decision tree is comprised of many nodes but not all nodes are 'terminal' in the sense that further splitting is done. From a predictive model implementation standpoint, the key thing to identify is which terminal node each observation falls into. It is of lesser interest (at time of scoring) what those predicted values were in nodes which were not split. As a result, the Decision Tree will assign a Node ID to each node (whether it is a terminal node or not) and it will assign a different set of values to just the Terminal node. In your case, you were likely looking at a graph plotted based on the numbering which included both terminal and non-terminal nodes which is why there were some with missing values. In SAS Enteprise Miner 14.1, the plot is generated based on the **Leaf** variable which only numbers the terminal nodes.

The graphs generated in SAS Enterprise Miner node results are provided to help visualize the results. You can see the underlying detail which the graph was created from -- they are not stored as graphic files but created on the fly when the node results are opened -- by clicking on the desired graph and then clicking on **View** --> **Table** in the node results window. This will show all of the detail available. You can also right-click on the graph and choose **Data Options...** and modify which information from the table is plotted.

I hope this helps!

Doug