Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Specifying splits in EM 7.1 Decision Trees

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-09-2012 01:36 PM

Hello All,

Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input? I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.

This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable. My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point. So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth. My question is: how can I specify the number of bins that are used in this procedure?

Thanks for your time.

Chad Atkinson

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-20-2012 09:13 AM

It may be poor form to answer my own post, but perhaps it will assist someone else.

If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction. The default is 5000.