Hello All,
Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input? I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.
This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable. My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point. So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth. My question is: how can I specify the number of bins that are used in this procedure?
Thanks for your time.
Chad Atkinson
It may be poor form to answer my own post, but perhaps it will assist someone else.
If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction. The default is 5000.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.