BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wave43
Obsidian | Level 7

I am using the option "Maximum Branch" to understand the best number of splits for my tree. It seems that SAS miner will split as many times as you specify for Maximum Branch. Is there any way that SAS mIner can find the most optimal one for me? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

Hi Wave!

The answer to this question depends a little on whether you are training a model or you are using trees to bin your inputs. Some general comments below.

 

Your comment just reminded me that I was supposed to get back to someone on the community about theoretical proofs for maxbranch. I don't have it handy, but I promise I will look for it. I am pretty sure I saw it in PROC HPSplit documentation... For now, just trust me on this one:

 

Someone worked out the math to prove how trees with 2 branches are expected to have less bias than trees with 3+ branches. That gives you solid ground to have a preference for 2 branches, although always feel free to compare.

 

There are specific cases when you want multiple branches. For example 1-depth trees for optimal binning. Popular threads about it here and here.

 

While someone posts that piece of literature, maximum branch is not exactly optimal branch. It assumes that when you are splitting your data or growing a tree you have a preference for N branches, but if the logworth of a split did not meet a minimum logworth, you would be OK if the tree tried a split with N-1 branches.

A cool workaround if you are looking at different maxbranch with several trees, just connect a bunch of trees with similar parameters and different maxbranch sizes, and compare them using a Model Comparison.

 

Does that help?

 

Thanks!

M

View solution in original post

3 REPLIES 3
M_Maldonado
Barite | Level 11

Hi Wave!

The answer to this question depends a little on whether you are training a model or you are using trees to bin your inputs. Some general comments below.

 

Your comment just reminded me that I was supposed to get back to someone on the community about theoretical proofs for maxbranch. I don't have it handy, but I promise I will look for it. I am pretty sure I saw it in PROC HPSplit documentation... For now, just trust me on this one:

 

Someone worked out the math to prove how trees with 2 branches are expected to have less bias than trees with 3+ branches. That gives you solid ground to have a preference for 2 branches, although always feel free to compare.

 

There are specific cases when you want multiple branches. For example 1-depth trees for optimal binning. Popular threads about it here and here.

 

While someone posts that piece of literature, maximum branch is not exactly optimal branch. It assumes that when you are splitting your data or growing a tree you have a preference for N branches, but if the logworth of a split did not meet a minimum logworth, you would be OK if the tree tried a split with N-1 branches.

A cool workaround if you are looking at different maxbranch with several trees, just connect a bunch of trees with similar parameters and different maxbranch sizes, and compare them using a Model Comparison.

 

Does that help?

 

Thanks!

M

wave43
Obsidian | Level 7

Yes, sounds like model compare is what I need to do.

M_Maldonado
Barite | Level 11

Neat trick here:

If you are gonna do this comparison a lot, create a template that you can later reuse. Once your diagram looks good to you, right click on its name (left panel) to Save as XML. Put it in a special location and rename if necessary.

Next time you need it, you can right-click on diagram and "Import from XML".

 

Let us know if you find an ultimate nice value for maxbranch for some data!

Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1009 views
  • 0 likes
  • 2 in conversation