BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mike90
Quartz | Level 8

How can I count the number of bottom level nodes that can result in a particular target level? (Or is a particular target level.)

 

I have a categorical target with 6 levels.  The 2nd most common level is chosen correctly 30% of the time.  The other 3 most common levels are chosen correctly between 70 and 85% of the time.  The full classification chart shows the 2nd most common level is rarely chosen when the other correct levels are missed.  I have concluded that there must be few decision rules that lead to this target level.  I'd like to produce a graph that shows how many bottom level nodes there are that can result in each target level. (This is a binary tree, so either the bottom level splitting nodes or the result nodes could be counted.)  

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PadraicGNeville
SAS Employee

Hi, Mike90.

 

In both PROC ARBOR (for decision trees) and PROC HPFOREST (for forests) accept the statement,

 

SAVE STATSBYNODE=StatsDataSet;

 

The StatsDataSet contains statistics on every node in the model.  There is a variable that is positive only for leaf nodes.

In ARBOR the name is Leaf.  The value is missing for internal nodes.

In HPFOREST there are two variables, TreeLeaf and ModelLeaf.  These are both 0 for internal nodes.

 

Another variable, Statistic, has categorical values, one of which is 'PREDICTION'.

 

So, assuming HPFOREST, something like:

data leafCategories;

   set StatsDataSet;

  keep category;

  if ModelLeaf > 0 and trim(left(statistic)) = 'PREDICTION' ;

 

Proc Freq data =leafCategories;

run;

 

gives the frequencies over the leaves of the predicted categories.

StatsDataSet contains other stats for the leaves that might be useful.

 

I say 'something like' because I have not run it and might have the IF statement wrong.

 

Note STATSBYNODE= is unsupported in HPFOREST, tech support cannot answer questions about it, and only works in SMP model.

 

Let me know if this does not resolve your question.

-Padraic

 

View solution in original post

2 REPLIES 2
PadraicGNeville
SAS Employee

Hi, Mike90.

 

In both PROC ARBOR (for decision trees) and PROC HPFOREST (for forests) accept the statement,

 

SAVE STATSBYNODE=StatsDataSet;

 

The StatsDataSet contains statistics on every node in the model.  There is a variable that is positive only for leaf nodes.

In ARBOR the name is Leaf.  The value is missing for internal nodes.

In HPFOREST there are two variables, TreeLeaf and ModelLeaf.  These are both 0 for internal nodes.

 

Another variable, Statistic, has categorical values, one of which is 'PREDICTION'.

 

So, assuming HPFOREST, something like:

data leafCategories;

   set StatsDataSet;

  keep category;

  if ModelLeaf > 0 and trim(left(statistic)) = 'PREDICTION' ;

 

Proc Freq data =leafCategories;

run;

 

gives the frequencies over the leaves of the predicted categories.

StatsDataSet contains other stats for the leaves that might be useful.

 

I say 'something like' because I have not run it and might have the IF statement wrong.

 

Note STATSBYNODE= is unsupported in HPFOREST, tech support cannot answer questions about it, and only works in SMP model.

 

Let me know if this does not resolve your question.

-Padraic

 

RalphAbbey
SAS Employee

If you're using HPSPLIT for your decision tree, then you can use the "NODES" option. This generates a table with the name NODETABLE. You can save this ODS table and use if further analysis.

 

proc hsplit data=X NODES;

input...

target...

ods output nodetable=MyName;

run;

 

This table (MyName in the example above) contains an ID string for each terminal (leaf) node. It also includes the path from the root node to the leaf node. In addition it includes the proportion of the events at each node in the path.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1180 views
  • 0 likes
  • 3 in conversation