BookmarkSubscribeRSS Feed
0 Likes

I think it woudl be great if they added an option to the HPFOREST node to automatically generate and visualize a  'representative tree' from the many trees that are assembled under the hood. This would take way some of the black box nature of this technique and there already is reserach on how best to show your most representative tree:

 

 

http://onlinelibrary.wiley.com/doi/10.1002/sim.4492/abstract

http://www-stat.wharton.upenn.edu/~edgeorge/Research_papers/forestCART.pdf

1 Comment
PadraicGNeville
SAS Employee

I would expect a central tree to be the tree built in the traditional maner, without randomly selecting out-of-bag data or candidate inputs with which split a node.   The randomness in the Forest algorithm serves to perturb the tree away from the non-randomly trained tree.  Using any reasonable metric of distance between trees, I would expect the traditional non-random tree to be in the center of a forest of randomly perturbed trees.  Neither paper mentions this.   Please let me know if I am missing something.

 

-Padraic