07-09-2012 01:36 PM
Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input? I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.
This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable. My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point. So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth. My question is: how can I specify the number of bins that are used in this procedure?
Thanks for your time.
07-20-2012 09:13 AM
It may be poor form to answer my own post, but perhaps it will assist someone else.
If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction. The default is 5000.