Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Specifying splits in EM 7.1 Decision Trees

Reply
Occasional Contributor
Posts: 5

Specifying splits in EM 7.1 Decision Trees

Hello All,

Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input?  I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.

This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable.  My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point.  So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth.  My question is: how can I specify the number of bins that are used in this procedure?

Thanks for your time.

Chad Atkinson

Occasional Contributor
Posts: 5

Re: Specifying splits in EM 7.1 Decision Trees

It may be poor form to answer my own post, but perhaps it will assist someone else.

If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction.  The default is 5000.

Ask a Question
Discussion stats
  • 1 reply
  • 249 views
  • 0 likes
  • 1 in conversation