BookmarkSubscribeRSS Feed
ChadAtkinson
Calcite | Level 5

Hello All,

Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input?  I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.

This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable.  My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point.  So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth.  My question is: how can I specify the number of bins that are used in this procedure?

Thanks for your time.

Chad Atkinson

1 REPLY 1
ChadAtkinson
Calcite | Level 5

It may be poor form to answer my own post, but perhaps it will assist someone else.

If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction.  The default is 5000.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 818 views
  • 0 likes
  • 1 in conversation