Re: Applied Analytics Using SAS Enterprise Miner
I have the following questions on specific aspects of how the Decision Tree node works:
1. Split search for nominal inputs: page 3.32 of the course notes states "[...] if the input is categorical, the average value of the target is taken within each categorical input level. The averages serve the same role as the unique interval input values in the discussion that follows […]"
Does the above mean that, in practice, a nominal input is treated in the same way as an interval one, which implies that, if there are L levels, the algorithm will only consider L-1 splitting points instead of the potential 2^(L-1) difference combination of those levels?
2. Property "Minimum Categorical Size": can this property be used to affect the tree growth? Am I right in saying that it defines the minimum number of cases that a level (of a class input) must have to be considered as a potential separate branch in a split?
3. Properties "Exhaustive" and "Node Sample": are they only used with multi-way splits or do they affect a binary tree with a binary target as well?
4. Property "Use Priors": what is the effect of this property? Would it affect the cut-off used for classifying leaves and derived Misclassification rate?
5. Property "Time of Bonferroni Adjustment": I understand the purpose of the Bonferroni Adjustment and how it works, but I am not sure on the purpose of this property and its impact on the growing algorithm
6. Output variables and metrics with Prior Probabilities: when defining Prior Probabilities, on the output dataset I can see the following variables which are not created by other modelling nodes: "Q_target" and "V_target". Are they related to posterior probabilities non-adjusted for priors (while the probabilities in "P_target" variables are adjusted for priors)? What is the difference between Q_ and V_ variables?
Moreover, the fit statistics, show "Average Square Error with Priors" and "Misclassification Rate with Priors": are they calculated on posterior probabilities adjusted for priors? (in which case I assume ASE is based on unadjusted posterior probabilities)
Thank you for your answers. Everything is clear now, with the only exception of the "Time of Bonferroni Adjustment": if option "After" means the Bonferroni adjustment takes place after the split is chosen, what is the point of applying it? What I mean is that if a split is chosen without considering the adjustment, then I do not understand what is the point of applying it after...or does it mean that the best split for a given input is chosen without applying the adjustment and then, before making a comparisong across all candidate inputs, the adjustement is applied so to penalise inputs with a large number of categories?
My Answer:
The difference between using the the Bonferroni adjustment before (default) and after (less conservative) in the decision tree size become obvious when maximum branch size is changed from 2 (default) to a higher number such as 5. With the default branch number 2, no difference in the tree size can be observed with before or after option.
1. Split search for nominal inputs: page 3.32 of the course notes states "[...] if the input is categorical, the average value of the target is taken within each categorical input level. The averages serve the same role as the unique interval input values in the discussion that follows […]"
Does the above mean that, in practice, a nominal input is treated in the same way as an interval one, which implies that, if there are L levels, the algorithm will only consider L-1 splitting points instead of the potential 2^(L-1) difference combination of those levels?
My answer:
For nominal input, depending on the the type of split(Default: binary split; 3-way split) the software will try all combinations of the potential split points and pic the top one based on Log worth (default).
Property "Minimum Categorical Size":can this property be used to affect the tree growth? Am I right in saying that it defines the minimum number of cases that a level (of a class input) must have to be considered as a potential separate branch in a split?
My Answer:
The default value of 5 for minimum categorical size is a requirement for the Chi-square test. Am I right in saying that it defines the minimum number of cases that a level (of a class input) must have to be considered as a potential separate branch in a split? Yes
Property "Minimum Categorical Size":can this property be used to affect the tree growth? By increasing from 5 to 50 this can reduce size of split point search in the tree growth.
3. Properties "Exhaustive" and "Node Sample": are they only used with multi-way splits or do they affect a binary tree with a binary target as well?
Please see the responses which I copied from SAS EM help:
6. Output variables and metrics with Prior Probabilities: when defining Prior Probabilities, on the output dataset I can see the following variables which are not created by other modelling nodes: "Q_target" and "V_target". Are they related to posterior probabilities non-adjusted for priors (while the probabilities in "P_target" variables are adjusted for priors)? What is the difference between Q_ and V_ variables?
Thank you for your answers. Everything is clear now, with the only exception of the "Time of Bonferroni Adjustment": if option "After" means the Bonferroni adjustment takes place after the split is chosen, what is the point of applying it? What I mean is that if a split is chosen without considering the adjustment, then I do not understand what is the point of applying it after...or does it mean that the best split for a given input is chosen without applying the adjustment and then, before making a comparisong across all candidate inputs, the adjustement is applied so to penalise inputs with a large number of categories?
Thank you for your answers. Everything is clear now, with the only exception of the "Time of Bonferroni Adjustment": if option "After" means the Bonferroni adjustment takes place after the split is chosen, what is the point of applying it? What I mean is that if a split is chosen without considering the adjustment, then I do not understand what is the point of applying it after...or does it mean that the best split for a given input is chosen without applying the adjustment and then, before making a comparisong across all candidate inputs, the adjustement is applied so to penalise inputs with a large number of categories?
My Answer:
The difference between using the the Bonferroni adjustment before (default) and after (less conservative) in the decision tree size become obvious when maximum branch size is changed from 2 (default) to a higher number such as 5. With the default branch number 2, no difference in the tree size can be observed with before or after option.
This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:
Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment