I'm doing a decision tree assignment for class. The data set has 131 records. When I partition the data (50-50-0) I do not get a decision tree. (There is just one node with no branches/leaves) However, if I run the decision tree without partitioning, I get a tree with three branches. I suspect that sample size may be an issue.
Can anyone confirm and point me in the direction of a reading on the subject?
You can try tweaking some of the options for growing the tree to be less restrictive: for example, lowering the values for the properties Minimum Categorical Size or Leaf Size, or raising the value for Significance Level if using the ProbF or ProbChisq splitting criteria.
I would agree with the sample size issue.
I'm not familiar enough with decision tree's to refer you to anything, but in regression a quick rule of thumb is 20 cases per predictor. That would be the equivalent of 20 cases per node. However, if your data is partitioned into small groups you're also more likely to get extreme cases where all of a single value may be in your test or modeling data set.
You can try tweaking some of the options for growing the tree to be less restrictive: for example, lowering the values for the properties Minimum Categorical Size or Leaf Size, or raising the value for Significance Level if using the ProbF or ProbChisq splitting criteria.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.