Solved: Decision Tree subtree properties selection

potiu · Posted 07-23-2018 01:39 AM

Hi,

I would like to ask about the selection of subtree for Decision tree.

What are the criterion to select in pair for Method option (Assessment, Largest and N) and Assessment measure option (Decision, Classifications, Average square error and Lift)?

How about in the Iteration plot where there is also similar selection (Average square tree, Miclassification tree, Sum of square error, maximum absolute error, and subtree assessment plot) to decide the optimal number of leaves?

I find it quite confusing and wish someone can explain this to me. Thank you.

Regards,

Potiu

WendyCzika · Posted 07-23-2018 10:09 AM

Here is information about those properties. If you choose Assessment for the subtree method, then you should be able to see in the Iteration plot that the subtree selected has the best value for whatever assessment measure you chose, but you can view the other measures as well (in case you want to re-run using one of the other measures to get a different subtree).

Method — specifies the method that you want to use to select a subtree from the fully grown tree for each possible number of leaves.

The following subtree methods are available:
- Assessment (default) — the smallest subtree with the best assessment value. The assessment value depends on the setting that you choose for the Assessment Measure property. Validation data set is used if available.
- Largest — the largest (full) tree is selected.
- N — the largest subtree with at most N leaves is selected. Use the Number of Leaves property to specify the value of N, the number of leaves.
Number of Leaves — when the Method property of the Decision Tree node is set to N, specifies the largest number of leaves that you want in a subtree of n leaves. Permitted values are integers greater than or equal to 1. The default value for the Number of Leaves property is 1.
Assessment Measure — specifies the method that you want to use to select the best tree, based on the validation data when the Method property is set to Assessment. If no validation data is available, training data is used.

The available assessment measurements are as follows:
- Decision (default setting) — selects the tree that has the largest average profit and smallest average loss if a profit or loss matrix is defined. If no profit or loss matrix is defined, the value of the model assessment measure is reset in the training process, depending on the measurement level of the target. If the target is interval, the measure is set to Average Square Error. If the target is categorical, the measure is set to Misclassification.
- Average Square Error — selects the tree that has the smallest average square error.
- Misclassification — selects the tree that has the smallest misclassification rate.
- Lift — evaluates the tree based on the prediction of the top n% of the ranked observations. Observations are ranked based on their posterior probabilities or predicted target values. For an interval target, it is the average predicted target value of the top n% observations. For a categorical target, it is the proportion of events in the top n% of the data. When you set the Measure property to Lift, you must use the Assessment Fraction property to specify the proportion for the top n% of cases.

View solution in original post

WendyCzika · Posted 07-23-2018 10:09 AM

Here is information about those properties. If you choose Assessment for the subtree method, then you should be able to see in the Iteration plot that the subtree selected has the best value for whatever assessment measure you chose, but you can view the other measures as well (in case you want to re-run using one of the other measures to get a different subtree).

Method — specifies the method that you want to use to select a subtree from the fully grown tree for each possible number of leaves.

The following subtree methods are available:
- Assessment (default) — the smallest subtree with the best assessment value. The assessment value depends on the setting that you choose for the Assessment Measure property. Validation data set is used if available.
- Largest — the largest (full) tree is selected.
- N — the largest subtree with at most N leaves is selected. Use the Number of Leaves property to specify the value of N, the number of leaves.
Number of Leaves — when the Method property of the Decision Tree node is set to N, specifies the largest number of leaves that you want in a subtree of n leaves. Permitted values are integers greater than or equal to 1. The default value for the Number of Leaves property is 1.
Assessment Measure — specifies the method that you want to use to select the best tree, based on the validation data when the Method property is set to Assessment. If no validation data is available, training data is used.

The available assessment measurements are as follows:
- Decision (default setting) — selects the tree that has the largest average profit and smallest average loss if a profit or loss matrix is defined. If no profit or loss matrix is defined, the value of the model assessment measure is reset in the training process, depending on the measurement level of the target. If the target is interval, the measure is set to Average Square Error. If the target is categorical, the measure is set to Misclassification.
- Average Square Error — selects the tree that has the smallest average square error.
- Misclassification — selects the tree that has the smallest misclassification rate.
- Lift — evaluates the tree based on the prediction of the top n% of the ranked observations. Observations are ranked based on their posterior probabilities or predicted target values. For an interval target, it is the average predicted target value of the top n% observations. For a categorical target, it is the proportion of events in the top n% of the data. When you set the Measure property to Lift, you must use the Assessment Fraction property to specify the proportion for the top n% of cases.

thia1169 · Posted 03-31-2019 09:23 AM

I could understand in terms of Decision tree where we can select a sub tree.

In Gradient boosting as well I see subtree option:

As we have defined max depth(for example =2) already, In one iteration tree can maximum have 4 leaves(given max branch =2).

As the gradient boosting algorithm is a sequential...will it select the subtree before moving to the next iteration?

Please provide explanation in terms of gradient boosting. Thanks in advance

Decision Tree subtree properties selection

Re: Decision Tree subtree properties selection

Re: Decision Tree subtree properties selection

Re: Decision Tree subtree properties selection

Decision Tree subtree properties selection

Re: Decision Tree subtree properties selection

Re: Decision Tree subtree properties selection

Re: Decision Tree subtree properties selection

Ready to join fellow brilliant minds for the SAS Hackathon?