BookmarkSubscribeRSS Feed
ChadAtkinson
Calcite | Level 5

Hello All,

Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input?  I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.

This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable.  My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point.  So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth.  My question is: how can I specify the number of bins that are used in this procedure?

Thanks for your time.

Chad Atkinson

1 REPLY 1
ChadAtkinson
Calcite | Level 5

It may be poor form to answer my own post, but perhaps it will assist someone else.

If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction.  The default is 5000.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 754 views
  • 0 likes
  • 1 in conversation