BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
VISHALKAPASI
Calcite | Level 5

Hello,

 

I am building a model to find those customers which are likely to be dormant in future. We want our model to have least misclassification rate for better campaigning activity.

 

Through Decision Tree my Misclassification Rate is 0.29. To reduce MR I have used Gradient Boosting, as I have gone through various papers GB reduces MR with multiple iteration, but even GB is showing MR 0.29.

 

I am not able to understand.

 

1) How is Gradient Boosting different from Decision Tree?

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

3) How to reduce misclassification rate?

 

Thank You,

Vishal

1 ACCEPTED SOLUTION

Accepted Solutions
PadraicGNeville
SAS Employee

1.  Gradient boosting in Enterprise Miner creates a series of decision trees.   The target for a tree is a residual from the series of trees already created.   

2. Run all three types of models, perhaps multiple times with different settings, and pick the one with the lowest MR on validation data.

3. Trial and error.   Try different models, different parameters, possibly create new variables.

 

Editor's note:adding other responses worth considering.  There are tradeoffs (accuracy vs readability, for example), and it's worth running multiple models and using the Model Comparison tool to evaluate.  From @pengfei

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

Accuracy of classification depends on both ML algorithm(Gradient Boosting, Decision tree etc) and dataset. Just my experience, most of time Gradient Boosting is more accurate than decision tree in binary classification. However, It's fairly easy to put all these algorithms together and do a model comparison.

3) How to reduce misclassification rate?

Try different parameter settings. For example, increase the default number of iterations; Decrease the leaf fraction; change the default number of bin.

 

And from @sinabl:

You have to see the scenario in bias / variance settings. Typically single decision tree has less bias and high variance. On the other hand, all the advanced methods like random forest (bagged decision trees), gradient boosting machines etc were typically introduced to reduce the variance. One should not look only from the misclassification rate perspective but also from the generalization ability of the model.

 

To improve the gradient boosting machine results, you will have to play with multiple algorithm parameters, like number of iterations, shrinkage (or learning parameter), training proportion, leaf size etc. If you decrease shrinkage parameter, don't forget to increase the number of iterations. 

View solution in original post

3 REPLIES 3
PadraicGNeville
SAS Employee

1.  Gradient boosting in Enterprise Miner creates a series of decision trees.   The target for a tree is a residual from the series of trees already created.   

2. Run all three types of models, perhaps multiple times with different settings, and pick the one with the lowest MR on validation data.

3. Trial and error.   Try different models, different parameters, possibly create new variables.

 

Editor's note:adding other responses worth considering.  There are tradeoffs (accuracy vs readability, for example), and it's worth running multiple models and using the Model Comparison tool to evaluate.  From @pengfei

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

Accuracy of classification depends on both ML algorithm(Gradient Boosting, Decision tree etc) and dataset. Just my experience, most of time Gradient Boosting is more accurate than decision tree in binary classification. However, It's fairly easy to put all these algorithms together and do a model comparison.

3) How to reduce misclassification rate?

Try different parameter settings. For example, increase the default number of iterations; Decrease the leaf fraction; change the default number of bin.

 

And from @sinabl:

You have to see the scenario in bias / variance settings. Typically single decision tree has less bias and high variance. On the other hand, all the advanced methods like random forest (bagged decision trees), gradient boosting machines etc were typically introduced to reduce the variance. One should not look only from the misclassification rate perspective but also from the generalization ability of the model.

 

To improve the gradient boosting machine results, you will have to play with multiple algorithm parameters, like number of iterations, shrinkage (or learning parameter), training proportion, leaf size etc. If you decrease shrinkage parameter, don't forget to increase the number of iterations. 

pengfei
SAS Employee

2) When to use Gradient Boosting, Decision Tree & Logistic Regression?

Accuracy of classification depends on both ML algorithm(Gradient Boosting, Decistion tree etc) and dataset. Just my experience, most of time Gradient Boosting is more accurate than decision tree in binary classification. However, It's fairly easy to put all these algorithms together and do a model comparison.

 

3) How to reduce misclassification rate?

Try different parameter settings. For example, increase the default number of iterations; Decrease the leaf fraction; change the default number of bin.

sinabl
SAS Employee

Hi,

 

You have to see the scenario in bias / variance settings. Typically single decision tree has less bias and high variance. On the other hand, all the advanced methods like random forest (bagged decision trees), gradient boosting machines etc were typically introduced to reduce the variance. One should not look only from the misclassification rate perspective but also from the generalization ability of the model.

 

To improve the gradient boosting machine results, you will have to play with multiple algorithm parameters, like number of iterations, shrinkage (or learning parameter), training proportion, leaf size etc. If you decrease shrinkage parameter, don't forget to increase the number of iterations. 

 

Best,

abhijit

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2804 views
  • 2 likes
  • 4 in conversation