SAS Data Science

VISHALKAPASI · Posted 01-19-2017 11:14 PM

Hello,

I am building a model to find those customers which are likely to be dormant in future. We want our model to have least misclassification rate for better campaigning activity.

Through Decision Tree my Misclassification Rate is 0.29. To reduce MR I have used Gradient Boosting, as I have gone through various papers GB reduces MR with multiple iteration, but even GB is showing MR 0.29.

I am not able to understand.

1) How is Gradient Boosting different from Decision Tree?

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

3) How to reduce misclassification rate?

Thank You,

Vishal

PadraicGNeville · Posted 01-24-2017 10:36 AM

1. Gradient boosting in Enterprise Miner creates a series of decision trees. The target for a tree is a residual from the series of trees already created.

2. Run all three types of models, perhaps multiple times with different settings, and pick the one with the lowest MR on validation data.

3. Trial and error. Try different models, different parameters, possibly create new variables.

Editor's note:adding other responses worth considering. There are tradeoffs (accuracy vs readability, for example), and it's worth running multiple models and using the Model Comparison tool to evaluate. From @pengfei

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

Accuracy of classification depends on both ML algorithm(Gradient Boosting, Decision tree etc) and dataset. Just my experience, most of time Gradient Boosting is more accurate than decision tree in binary classification. However, It's fairly easy to put all these algorithms together and do a model comparison.

3) How to reduce misclassification rate?

Try different parameter settings. For example, increase the default number of iterations; Decrease the leaf fraction; change the default number of bin.

And from @sinabl:

You have to see the scenario in bias / variance settings. Typically single decision tree has less bias and high variance. On the other hand, all the advanced methods like random forest (bagged decision trees), gradient boosting machines etc were typically introduced to reduce the variance. One should not look only from the misclassification rate perspective but also from the generalization ability of the model.

To improve the gradient boosting machine results, you will have to play with multiple algorithm parameters, like number of iterations, shrinkage (or learning parameter), training proportion, leaf size etc. If you decrease shrinkage parameter, don't forget to increase the number of iterations.

View solution in original post

PadraicGNeville · Posted 01-24-2017 10:36 AM

1. Gradient boosting in Enterprise Miner creates a series of decision trees. The target for a tree is a residual from the series of trees already created.

2. Run all three types of models, perhaps multiple times with different settings, and pick the one with the lowest MR on validation data.

3. Trial and error. Try different models, different parameters, possibly create new variables.

Editor's note:adding other responses worth considering. There are tradeoffs (accuracy vs readability, for example), and it's worth running multiple models and using the Model Comparison tool to evaluate. From @pengfei

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

Accuracy of classification depends on both ML algorithm(Gradient Boosting, Decision tree etc) and dataset. Just my experience, most of time Gradient Boosting is more accurate than decision tree in binary classification. However, It's fairly easy to put all these algorithms together and do a model comparison.

3) How to reduce misclassification rate?

Try different parameter settings. For example, increase the default number of iterations; Decrease the leaf fraction; change the default number of bin.

And from @sinabl:

You have to see the scenario in bias / variance settings. Typically single decision tree has less bias and high variance. On the other hand, all the advanced methods like random forest (bagged decision trees), gradient boosting machines etc were typically introduced to reduce the variance. One should not look only from the misclassification rate perspective but also from the generalization ability of the model.

To improve the gradient boosting machine results, you will have to play with multiple algorithm parameters, like number of iterations, shrinkage (or learning parameter), training proportion, leaf size etc. If you decrease shrinkage parameter, don't forget to increase the number of iterations.

pengfei · Posted 01-24-2017 04:51 PM

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

Accuracy of classification depends on both ML algorithm(Gradient Boosting, Decistion tree etc) and dataset. Just my experience, most of time Gradient Boosting is more accurate than decision tree in binary classification. However, It's fairly easy to put all these algorithms together and do a model comparison.

3) How to reduce misclassification rate?

Try different parameter settings. For example, increase the default number of iterations; Decrease the leaf fraction; change the default number of bin.

sinabl · Posted 01-24-2017 11:01 PM

Hi,

You have to see the scenario in bias / variance settings. Typically single decision tree has less bias and high variance. On the other hand, all the advanced methods like random forest (bagged decision trees), gradient boosting machines etc were typically introduced to reduce the variance. One should not look only from the misclassification rate perspective but also from the generalization ability of the model.

To improve the gradient boosting machine results, you will have to play with multiple algorithm parameters, like number of iterations, shrinkage (or learning parameter), training proportion, leaf size etc. If you decrease shrinkage parameter, don't forget to increase the number of iterations.

Best,

abhijit

SAS Data Science

Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Follow Us

What is...

SAS Data Science

Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Re: Is Gradient Boosting is better than Decision Tree

Our biggest data and AI event of the year.

Follow Us

What is...