Solved: Re: Questions on Decision Tree node

pvareschi · Posted 05-13-2020 08:45 AM

Re: Applied Analytics Using SAS Enterprise Miner

I have the following questions on specific aspects of how the Decision Tree node works:

1. Split search for nominal inputs: page 3.32 of the course notes states "[...] if the input is categorical, the average value of the target is taken within each categorical input level. The averages serve the same role as the unique interval input values in the discussion that follows […]"

Does the above mean that, in practice, a nominal input is treated in the same way as an interval one, which implies that, if there are L levels, the algorithm will only consider L-1 splitting points instead of the potential 2^(L-1) difference combination of those levels?

2. Property "Minimum Categorical Size": can this property be used to affect the tree growth? Am I right in saying that it defines the minimum number of cases that a level (of a class input) must have to be considered as a potential separate branch in a split?

3. Properties "Exhaustive" and "Node Sample": are they only used with multi-way splits or do they affect a binary tree with a binary target as well?

4. Property "Use Priors": what is the effect of this property? Would it affect the cut-off used for classifying leaves and derived Misclassification rate?

5. Property "Time of Bonferroni Adjustment": I understand the purpose of the Bonferroni Adjustment and how it works, but I am not sure on the purpose of this property and its impact on the growing algorithm

6. Output variables and metrics with Prior Probabilities: when defining Prior Probabilities, on the output dataset I can see the following variables which are not created by other modelling nodes: "Q_target" and "V_target". Are they related to posterior probabilities non-adjusted for priors (while the probabilities in "P_target" variables are adjusted for priors)? What is the difference between Q_ and V_ variables?

Moreover, the fit statistics, show "Average Square Error with Priors" and "Misclassification Rate with Priors": are they calculated on posterior probabilities adjusted for priors? (in which case I assume ASE is based on unadjusted posterior probabilities)

gcjfernandez · Posted 05-27-2020 01:03 PM

Posted Sunday (22 views) | In reply to gcjfernandez_gmail_com

Thank you for your answers. Everything is clear now, with the only exception of the "Time of Bonferroni Adjustment": if option "After" means the Bonferroni adjustment takes place after the split is chosen, what is the point of applying it? What I mean is that if a split is chosen without considering the adjustment, then I do not understand what is the point of applying it after...or does it mean that the best split for a given input is chosen without applying the adjustment and then, before making a comparisong across all candidate inputs, the adjustement is applied so to penalise inputs with a large number of categories?

My Answer:

The difference between using the the Bonferroni adjustment before (default) and after (less conservative) in the decision tree size become obvious when maximum branch size is changed from 2 (default) to a higher number such as 5. With the default branch number 2, no difference in the tree size can be observed with before or after option.

View solution in original post

gcjfernandez · Posted 05-13-2020 06:10 PM

1. Split search for nominal inputs: page 3.32 of the course notes states "[...] if the input is categorical, the average value of the target is taken within each categorical input level. The averages serve the same role as the unique interval input values in the discussion that follows […]"

Does the above mean that, in practice, a nominal input is treated in the same way as an interval one, which implies that, if there are L levels, the algorithm will only consider L-1 splitting points instead of the potential 2^(L-1) difference combination of those levels?

My answer:

For nominal input, depending on the the type of split(Default: binary split; 3-way split) the software will try all combinations of the potential split points and pic the top one based on Log worth (default).

Property "Minimum Categorical Size":can this property be used to affect the tree growth? Am I right in saying that it defines the minimum number of cases that a level (of a class input) must have to be considered as a potential separate branch in a split?

My Answer:

The default value of 5 for minimum categorical size is a requirement for the Chi-square test. Am I right in saying that it defines the minimum number of cases that a level (of a class input) must have to be considered as a potential separate branch in a split? Yes

Property "Minimum Categorical Size":can this property be used to affect the tree growth? By increasing from 5 to 50 this can reduce size of split point search in the tree growth.

3. Properties "Exhaustive" and "Node Sample": are they only used with multi-way splits or do they affect a binary tree with a binary target as well?

Please see the responses which I copied from SAS EM help:

Exhaustive — specifies the highest number of candidate splits that you want to find in an exhaustive search. The Exhaustive property applies to multi-way splits and to binary splits on nominal targets with more than two values. Permissible values are integers between 0 and 2,000,000,000. The default setting for the Exhaustive property is 5000.
Node Sample — specifies the maximum within-node sample size n that you want to use to find splits. If the number of training observations in a node is larger than n, then the split search for that node is based on a random sample of size n. Permissible values are integers greater than or equal to 2. The default value for the Node Sample property is 2000
.4. Property "Use Priors": what is the effect of this property? Would it affect the cut-off used for classifying leaves and derived Misclassification rate?
Using priors will affect the split-search algorithm and the derived misclassification rate.
5. Property "Time of Bonferroni Adjustment": I understand the purpose of the Bonferroni Adjustment and how it works, but I am not sure on the purpose of this property and its impact on the growing algorithm.
From the help:
Bonferroni Adjustment — When set to No, the Bonferroni Adjustment property of the Decision Tree node suppresses Bonferroni adjustments to the p-values. The default setting is Yes.
Time of Bonferroni Adjustment — indicates whether the Bonferroni adjustment should take place Before or After the split is chosen. The default setting is Before. The Time of Bonferroni Adjustment property is ignored if the Bonferroni Adjustment property is set to No. The default setting Before is more conservative and controls the tree size.
6. Output variables and metrics with Prior Probabilities: when defining Prior Probabilities, on the output dataset I can see the following variables which are not created by other modelling nodes: "Q_target" and "V_target". Are they related to posterior probabilities non-adjusted for priors (while the probabilities in "P_target" variables are adjusted for priors)? What is the difference between Q_ and V_ variables?
P_ is the prior adjusted probabilities for the training data
V_is the prior adjusted probabilities for the validation data
Q_ Unadjusted probabilities for the validation data
Moreover, the fit statistics, show "Average Square Error with Priors" and "Misclassification Rate with Priors": are they calculated on posterior probabilities adjusted for priors? (in which case I assume ASE is based on unadjusted posterior probabilities)
Yes that is my understanding also

pvareschi · Posted 05-24-2020 04:49 AM

Thank you for your answers. Everything is clear now, with the only exception of the "Time of Bonferroni Adjustment": if option "After" means the Bonferroni adjustment takes place after the split is chosen, what is the point of applying it? What I mean is that if a split is chosen without considering the adjustment, then I do not understand what is the point of applying it after...or does it mean that the best split for a given input is chosen without applying the adjustment and then, before making a comparisong across all candidate inputs, the adjustement is applied so to penalise inputs with a large number of categories?

gcjfernandez · Posted 05-27-2020 01:03 PM

Posted Sunday (22 views) | In reply to gcjfernandez_gmail_com

Thank you for your answers. Everything is clear now, with the only exception of the "Time of Bonferroni Adjustment": if option "After" means the Bonferroni adjustment takes place after the split is chosen, what is the point of applying it? What I mean is that if a split is chosen without considering the adjustment, then I do not understand what is the point of applying it after...or does it mean that the best split for a given input is chosen without applying the adjustment and then, before making a comparisong across all candidate inputs, the adjustement is applied so to penalise inputs with a large number of categories?

My Answer:

The difference between using the the Bonferroni adjustment before (default) and after (less conservative) in the decision tree size become obvious when maximum branch size is changed from 2 (default) to a higher number such as 5. With the default branch number 2, no difference in the tree size can be observed with before or after option.

Questions on Decision Tree node

Re: Questions on Decision Tree node

Re: Questions on Decision Tree node

Re: Questions on Decision Tree node

Re: Questions on Decision Tree node

Click image to register for webinar

Classroom Training Available!