Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Can SAS Enterpise Miner find the most Optimal Node Split?

Accepted Solution Solved
Reply
Contributor
Posts: 31
Accepted Solution

Can SAS Enterpise Miner find the most Optimal Node Split?

I am using the option "Maximum Branch" to understand the best number of splits for my tree. It seems that SAS miner will split as many times as you specify for Maximum Branch. Is there any way that SAS mIner can find the most optimal one for me? Thanks!


Accepted Solutions
Solution
‎01-15-2016 03:55 PM
Super Contributor
Posts: 336

Re: Can SAS Enterpise Miner find the most Optimal Node Split?

Hi Wave!

The answer to this question depends a little on whether you are training a model or you are using trees to bin your inputs. Some general comments below.

 

Your comment just reminded me that I was supposed to get back to someone on the community about theoretical proofs for maxbranch. I don't have it handy, but I promise I will look for it. I am pretty sure I saw it in PROC HPSplit documentation... For now, just trust me on this one:

 

Someone worked out the math to prove how trees with 2 branches are expected to have less bias than trees with 3+ branches. That gives you solid ground to have a preference for 2 branches, although always feel free to compare.

 

There are specific cases when you want multiple branches. For example 1-depth trees for optimal binning. Popular threads about it here and here.

 

While someone posts that piece of literature, maximum branch is not exactly optimal branch. It assumes that when you are splitting your data or growing a tree you have a preference for N branches, but if the logworth of a split did not meet a minimum logworth, you would be OK if the tree tried a split with N-1 branches.

A cool workaround if you are looking at different maxbranch with several trees, just connect a bunch of trees with similar parameters and different maxbranch sizes, and compare them using a Model Comparison.

 

Does that help?

 

Thanks!

M

View solution in original post


All Replies
Solution
‎01-15-2016 03:55 PM
Super Contributor
Posts: 336

Re: Can SAS Enterpise Miner find the most Optimal Node Split?

Hi Wave!

The answer to this question depends a little on whether you are training a model or you are using trees to bin your inputs. Some general comments below.

 

Your comment just reminded me that I was supposed to get back to someone on the community about theoretical proofs for maxbranch. I don't have it handy, but I promise I will look for it. I am pretty sure I saw it in PROC HPSplit documentation... For now, just trust me on this one:

 

Someone worked out the math to prove how trees with 2 branches are expected to have less bias than trees with 3+ branches. That gives you solid ground to have a preference for 2 branches, although always feel free to compare.

 

There are specific cases when you want multiple branches. For example 1-depth trees for optimal binning. Popular threads about it here and here.

 

While someone posts that piece of literature, maximum branch is not exactly optimal branch. It assumes that when you are splitting your data or growing a tree you have a preference for N branches, but if the logworth of a split did not meet a minimum logworth, you would be OK if the tree tried a split with N-1 branches.

A cool workaround if you are looking at different maxbranch with several trees, just connect a bunch of trees with similar parameters and different maxbranch sizes, and compare them using a Model Comparison.

 

Does that help?

 

Thanks!

M

Contributor
Posts: 31

Re: Can SAS Enterpise Miner find the most Optimal Node Split?

Yes, sounds like model compare is what I need to do.

Super Contributor
Posts: 336

Re: Can SAS Enterpise Miner find the most Optimal Node Split?

Neat trick here:

If you are gonna do this comparison a lot, create a template that you can later reuse. Once your diagram looks good to you, right click on its name (left panel) to Save as XML. Put it in a special location and rename if necessary.

Next time you need it, you can right-click on diagram and "Import from XML".

 

Let us know if you find an ultimate nice value for maxbranch for some data!

Thanks!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 343 views
  • 0 likes
  • 2 in conversation