SAS Data Science

herzizza88 · Posted 12-12-2018 09:12 PM

Hi,

I am out of options to solve this. I did a decision tree and want to check the misclassification rate to see if there is any overfit and the optimal number of leaves, but it gave me this.Didnt help that all my own lecturer said was "Doesn't look good" without any explanation.

Screen Shot 2018-12-13 at 10.01.56 AM.png

DougWielenga · Posted 12-14-2018 10:15 AM

I did a decision tree and want to check the misclassification rate to see if there is any overfit and the optimal number of leaves, but it gave me this.Didnt help that all my own lecturer said was "Doesn't look good" without any explanation.

What you are looking at is a tree that did not split whatsoever. As a result, the single node tree predicts every observations as being in the most common category. It appears that your rarest event occurs about 34% of the time and your common event happens about 66% of the time since the 34% misclassification rate corresponds to the proportion of rare events that were predicted to be the common event (since every observation gets the same prediction when no splits are made).

It is common to encounter this in rare event scenarios (that is, when you have a small percentage of target events in the training data) but you have at least 1/3 of your data with the target event. It is possible that using some decision weights to increase the chance for the more rare event to be chosen and thereby achieve some splitting, but it is more likely that you need to determine if you have one of the following situations:

1 - Very weak input variables which are only slightly related to the outcome if at all -- if so,consider trying to fit the tree interactively to see if there are any useful predictors at all)

2 - A target variable which is difficult to measure accurately (e.g. customer satisfaction) -- if so, consider what you can do to better define and/or measure the target of interest

3 - Improperly prepared input data (e.g. are you getting all the possible information out of your input data?) -- if so, consider seeing what transformations might be helpful, such as taking data with timestamps which are non typically useful outside of forecasting models and transforming the timestamps into variables like year/quarter/month or taking variable which have too many levels such as SKU number and creating one or more variables which represent meaningful categories of SKU numbers.

For information on how to use decision weights to better fit rare event scenarios (even though your event is not overly rare), see the solution in the community article linked below:

https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/A-Question-on-Modeling-Rare-Events-Data/m...

Hope this helps!

Doug

View solution in original post

DougWielenga · Posted 12-14-2018 10:15 AM

I did a decision tree and want to check the misclassification rate to see if there is any overfit and the optimal number of leaves, but it gave me this.Didnt help that all my own lecturer said was "Doesn't look good" without any explanation.

What you are looking at is a tree that did not split whatsoever. As a result, the single node tree predicts every observations as being in the most common category. It appears that your rarest event occurs about 34% of the time and your common event happens about 66% of the time since the 34% misclassification rate corresponds to the proportion of rare events that were predicted to be the common event (since every observation gets the same prediction when no splits are made).

It is common to encounter this in rare event scenarios (that is, when you have a small percentage of target events in the training data) but you have at least 1/3 of your data with the target event. It is possible that using some decision weights to increase the chance for the more rare event to be chosen and thereby achieve some splitting, but it is more likely that you need to determine if you have one of the following situations:

1 - Very weak input variables which are only slightly related to the outcome if at all -- if so,consider trying to fit the tree interactively to see if there are any useful predictors at all)

2 - A target variable which is difficult to measure accurately (e.g. customer satisfaction) -- if so, consider what you can do to better define and/or measure the target of interest

3 - Improperly prepared input data (e.g. are you getting all the possible information out of your input data?) -- if so, consider seeing what transformations might be helpful, such as taking data with timestamps which are non typically useful outside of forecasting models and transforming the timestamps into variables like year/quarter/month or taking variable which have too many levels such as SKU number and creating one or more variables which represent meaningful categories of SKU numbers.

For information on how to use decision weights to better fit rare event scenarios (even though your event is not overly rare), see the solution in the community article linked below:

https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/A-Question-on-Modeling-Rare-Events-Data/m...

Hope this helps!

Doug

SAS Data Science

How do i intepret/fix the misclassification rate in subtree assessment plot?

Re: How do i intepret/fix the misclassification rate in subtree assessment plot?

Re: How do i intepret/fix the misclassification rate in subtree assessment plot?

SAS 9 Content Assessment: Application Usage

SAS Content Assessment Proven Practice

SAS 9 Content Assessment: Inventory

SAS 9 Content Assessment: Code Check

Butterflies - Assessment of Data Source Credibility

Follow Us

What is...

SAS Data Science

Our biggest data and AI event of the year.

Follow Us

What is...