Hi all,
I would like to clarify the actual name for probability tree, maximal tree and pruned tree that are commonly used in SAS EM book called Applied Analytics Using SAS Enterprise Miner Course Note.
Thank you.
Regards,
Potiu
In the context of the Applied Analytics using SAS Enterprise Miner (AAEM) course:
-Probability tree refers to the decision tree which is optimized (pruned) with respect to Average Squared Error (ASE). This is the tree optimized to predict the best "estimates" which in this case are probabilities. Hence the name for that tree in the instructor lead demonstration.
-The maximal tree is the tree grown before any pruning takes place. Generally the maximal tree is a tree which is over fit to the training data and does not generalize well. There are "stopping rules" in the properties panel of the decision tree node that control how large the maximal tree grows.
-A pruned tree refers to a tree which has been optimized for complexity. This means the maximal tree was grown and then branches were pruned (trimmed) off based to optimize some assessment measure value on the validation data set.
Dear JThompson,
The identities of the pruned, maximum and probability trees are not CART and CHAID.
Can I know whether these three types of tree are either ID3 or C4.5.
Please advise. Thank you.
Regards,
Potiu
CART stands for "classification and regression trees". This means CART would refer to any tree which has its complexity optimized by (in other words pruned using) misclassification rate or average squared error (ASE). Given this, then the "probability tree" would be considered as a CART tree.
CHAID, ID3 and C4.5 are all GROWTH options, meaning they control how the tree is grown, in other words how splits are determined. So it would need to be stated how the trees were grown, before one could say if they are CHAID, ID3 or C4.5. So technically any of the tree names you asked about, probability tree, maximal tree, pruned tree, could each be any one of CHAID, ID3, or C4.5. Specifically to the AAEM course, trees are build using only logworth, which is CHAID. So in the context of AAEM, you could say all 3 of the trees you asked about were grown using logworth (based on Chi-squared) so they could be called CHAID trees. Outside the AAEM class, however, one would need to know how the tree was grown.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.