Hello All,
Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input? I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.
This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable. My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point. So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth. My question is: how can I specify the number of bins that are used in this procedure?
Thanks for your time.
Chad Atkinson
It may be poor form to answer my own post, but perhaps it will assist someone else.
If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction. The default is 5000.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.