Calcite | Level 5

Difference between boosting through start groups and gradient boosting

Hi All,

New year greeting to all the community folks. I have a question to ask: There are 2 ways by which boosting approach can be implemented for classifier models like decision trees in SAS E-Miner.

One is through the use of gradient boosting node. while the other is through using start and end group nodes (decision tree embedded between start and end groups).

Can someone help me understand what is the difference amongst the above 2 options.

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Employee

Re: Difference between boosting through start groups and gradient boosting

Bagging (Breiman 1996) is a common ensemble algorithm, in which you do the following:

1. Develop separate models on k random samples of the data of about the same size.

2. Fit a classification or regression tree to each sample. I tend to bag only trees but the start and end group nodes.  allow other algorithms.

3. Average or vote to derive the final predictions or classifications.

Boosting (Freund and Schapire, 1996) ,also supported through start  >  tree > end , goes one step further and weights observations that are misclassified in the previous models more heavily for inclusion into subsequent samples.  The successive samples are adjusted to accommodate previously computed inaccuracies. Gradient boosting (Friedman 2001) resamples the training data several times to generate results that form a weighted average of the resampled data set.  Each tree in the series is fit to the residual of the prediction from the earlier trees in the series. The residual is defined in terms of the derivative of a loss function. For squared error loss and an interval target, the residual is simply the target value minus the predicted value.  Because each successive sample is weighted according to the classification accuracy of previous models, this approach is sometimes called stochastic gradient boosting.

Random forests is my favorite data mining algorithm especially, when I have little subject knowledge of the application.   You grow many large decision trees at random and vote over all trees in the forest. The algorithm works as follows:

1. You develop random samples of the data and grow k decision trees.  The size of k is large, usually greater than or equal to 100.   A typical sample size is about two-thirds of the training data.
2. At each split point for each tree you evaluate a random subset of candidate inputs (predictors) are evaluated. You hold the size of the subset constant across all trees.
3. You grow each tree is as large as possible without pruning.

In a random forest this case you are perturbing not only the data but also the variables that are used to construct each tree. The error rate is measured on the remaining holdout data not used for training.   This remaining one-third of the data is called the out-of-bag sample.  Variable importance can also be inferred based on how often an input was used in the construction of the trees.

SAS Employee

Re: Difference between boosting through start groups and gradient boosting

Bagging (Breiman 1996) is a common ensemble algorithm, in which you do the following:

1. Develop separate models on k random samples of the data of about the same size.

2. Fit a classification or regression tree to each sample. I tend to bag only trees but the start and end group nodes.  allow other algorithms.

3. Average or vote to derive the final predictions or classifications.

Boosting (Freund and Schapire, 1996) ,also supported through start  >  tree > end , goes one step further and weights observations that are misclassified in the previous models more heavily for inclusion into subsequent samples.  The successive samples are adjusted to accommodate previously computed inaccuracies. Gradient boosting (Friedman 2001) resamples the training data several times to generate results that form a weighted average of the resampled data set.  Each tree in the series is fit to the residual of the prediction from the earlier trees in the series. The residual is defined in terms of the derivative of a loss function. For squared error loss and an interval target, the residual is simply the target value minus the predicted value.  Because each successive sample is weighted according to the classification accuracy of previous models, this approach is sometimes called stochastic gradient boosting.

Random forests is my favorite data mining algorithm especially, when I have little subject knowledge of the application.   You grow many large decision trees at random and vote over all trees in the forest. The algorithm works as follows:

1. You develop random samples of the data and grow k decision trees.  The size of k is large, usually greater than or equal to 100.   A typical sample size is about two-thirds of the training data.
2. At each split point for each tree you evaluate a random subset of candidate inputs (predictors) are evaluated. You hold the size of the subset constant across all trees.
3. You grow each tree is as large as possible without pruning.

In a random forest this case you are perturbing not only the data but also the variables that are used to construct each tree. The error rate is measured on the remaining holdout data not used for training.   This remaining one-third of the data is called the out-of-bag sample.  Variable importance can also be inferred based on how often an input was used in the construction of the trees.

Discussion stats