BookmarkSubscribeRSS Feed
ChadAtkinson
Calcite | Level 5

Hello All,

Is there a way to specify the number of bins that EM uses when calculating the optimal splitting value for an interval level input?  I am building a decision tree model, and would like to specify the number of divisions/bins/comparison points that EM uses to calculate the optimal value for splitting.

This may be best explained with an example: Say var1 ranges from 1 to 100 and I want to determine the optimal binary split for this variable.  My understanding is that EM would bin var1 into a number of buckets and then check the point between each bin as a potential split point.  So if there were 100 bins, EM would check bucket1 vs buckets2-100, then buckets1-2 vs 3-100, etc... and select the split with the best logworth.  My question is: how can I specify the number of bins that are used in this procedure?

Thanks for your time.

Chad Atkinson

1 REPLY 1
ChadAtkinson
Calcite | Level 5

It may be poor form to answer my own post, but perhaps it will assist someone else.

If you use the proc arbor statement, there is an option (exhaustive=) that controls the number of bins that are used when determining the optimal split point in decision tree construction.  The default is 5000.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 744 views
  • 0 likes
  • 1 in conversation