turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Can SAS Enterpise Miner find the most Optimal Node...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-14-2016 01:48 PM

I am using the option "Maximum Branch" to understand the best number of splits for my tree. It seems that SAS miner will split as many times as you specify for Maximum Branch. Is there any way that SAS mIner can find the most optimal one for me? Thanks!

Accepted Solutions

Solution

01-15-2016
03:55 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 03:36 PM

Hi Wave!

The answer to this question depends a little on whether you are training a model or you are using trees to bin your inputs. Some general comments below.

Your comment just reminded me that I was supposed to get back to someone on the community about theoretical proofs for maxbranch. I don't have it handy, but I promise I will look for it. I am pretty sure I saw it in PROC HPSplit documentation... For now, just trust me on this one:

Someone worked out the math to prove how trees with 2 branches are expected to have less bias than trees with 3+ branches. That gives you solid ground to have a preference for 2 branches, although always feel free to compare.

There are specific cases when you want multiple branches. For example 1-depth trees for optimal binning. Popular threads about it here and here.

While someone posts that piece of literature, maximum branch is not exactly optimal branch. It assumes that when you are splitting your data or growing a tree you have a preference for N branches, but if the logworth of a split did not meet a minimum logworth, you would be OK if the tree tried a split with N-1 branches.

A cool workaround if you are looking at different maxbranch with several trees, just connect a bunch of trees with similar parameters and different maxbranch sizes, and compare them using a Model Comparison.

Does that help?

Thanks!

M

All Replies

Solution

01-15-2016
03:55 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 03:36 PM

Hi Wave!

The answer to this question depends a little on whether you are training a model or you are using trees to bin your inputs. Some general comments below.

Your comment just reminded me that I was supposed to get back to someone on the community about theoretical proofs for maxbranch. I don't have it handy, but I promise I will look for it. I am pretty sure I saw it in PROC HPSplit documentation... For now, just trust me on this one:

Someone worked out the math to prove how trees with 2 branches are expected to have less bias than trees with 3+ branches. That gives you solid ground to have a preference for 2 branches, although always feel free to compare.

There are specific cases when you want multiple branches. For example 1-depth trees for optimal binning. Popular threads about it here and here.

While someone posts that piece of literature, maximum branch is not exactly optimal branch. It assumes that when you are splitting your data or growing a tree you have a preference for N branches, but if the logworth of a split did not meet a minimum logworth, you would be OK if the tree tried a split with N-1 branches.

A cool workaround if you are looking at different maxbranch with several trees, just connect a bunch of trees with similar parameters and different maxbranch sizes, and compare them using a Model Comparison.

Does that help?

Thanks!

M

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 03:55 PM

Yes, sounds like model compare is what I need to do.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-15-2016 04:02 PM

Neat trick here:

If you are gonna do this comparison a lot, create a template that you can later reuse. Once your diagram looks good to you, right click on its name (left panel) to Save as XML. Put it in a special location and rename if necessary.

Next time you need it, you can right-click on diagram and "Import from XML".

Let us know if you find an ultimate nice value for maxbranch for some data!

Thanks!