BookmarkSubscribeRSS Feed
vikrantarora25
Calcite | Level 5

Hi,

 

Would be grateful if some expert on the forum can help me understand how to decide optimum number of leaves in a decision tree analysis.

I am using SAS and if I supply leaves=6 in my model then miss-classification rates for validation & training data sets are 18.6% & 18.8% respectively. And SAS lists 5 variables which are significant.

 

And if I don't supply leaves count in the code and let SAS decide it, then SAS after pruning takes 10 as leaves count and miss-classification rates for validation & training data sets are 17.5% & 16.9% respectively. And SAS lists 6 variables which are significant.

Now that the miss-classification rates have reduced & trees after pruning have increased from 4 to 10, is it a good thing or it indicates overfitting?

 

Looking forward to opinions of experts in this group.

 

Thanks & Regards

Vikrant

1 REPLY 1
DWilson
Pyrite | Level 9

@vikrantarora25 wrote:

Hi,

 

Would be grateful if some expert on the forum can help me understand how to decide optimum number of leaves in a decision tree analysis.

I am using SAS and if I supply leaves=6 in my model then miss-classification rates for validation & training data sets are 18.6% & 18.8% respectively. And SAS lists 5 variables which are significant.

 

And if I don't supply leaves count in the code and let SAS decide it, then SAS after pruning takes 10 as leaves count and miss-classification rates for validation & training data sets are 17.5% & 16.9% respectively. And SAS lists 6 variables which are significant.

Now that the miss-classification rates have reduced & trees after pruning have increased from 4 to 10, is it a good thing or it indicates overfitting?

 

Looking forward to opinions of experts in this group.

 

Thanks & Regards

Vikrant


There's a subjectivity to model building. You need to consider the following questions:

 

1) Are the variables in a given model likely to be related to the outcome? If you are doing exploratory modeling then you may not have a good idea about this.

 

2) What is a large misclassification rate? It depends on what you are trying to do. What does being wrong 1/5th of the time mean for your use of the model? Is that an acceptable misclassification rate? No one can answer this for you. There may be some models where you can only put up with very small misclassification rates and others where the rates can be larger.

 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 916 views
  • 0 likes
  • 2 in conversation