Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 115
Accepted Solution

Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation

We are building a very basic Decision Tree - love EM too! We have the four following nodes:

Input Data,

Data Partition,

Consolidation Tree, then

Decision Tree.

The Consolidation Tree is actually a variation of a Decision Tree where we are taking some variables with many nominal categories (sometimes in the hundreds) and seeing we can relate them into simplified groupings to our main dependent/target variable. Below is is a snapshot of part of our Consolidation Tree:

Capture.PNG

The origin node looks great and we split our data 80/20 between training/validation. The first significant level is called NAC_CODE and it grouped the variable into two nice nodes. But the next level down for one of the nodes splits GOVERNING_CLASS into two nodes again - problem is one of them is a node for Missing_Values_Only. I normally would not be too concerned as many of the variables within our dataset have missing values. But GOVERNING_CLASS has zero. I fully understand how EM automatically groups the missing with other values of response for varying nodes even when there might be none in the present dataset for scoring purposes, but this does not make sense at all to be by itself.

Please help. I have some other questions coming after this one is resolved as well.

Thank you very much.

Zach Feinstein, Statistical Data Modeler

P (952) 838-4289 C (612) 590-4813  F (952) 838-2010

SFM Mutual Insurance Company

3500 American Blvd. W,
Suite 700, Bloomington, MN 55431


Accepted Solutions
Solution
‎10-17-2014 02:33 PM
SAS Super FREQ
Posts: 306

Re: Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation

I think what is happening here is that categories with less than the value specified for the Decision Tree node property Minimum Categorical Size are treated as missing, so that's why you are seeing that branch for GOVERNING_CLASS even though it has no missing values.  So one option is to change (lower) that value so categories with extremely small numbers are not treated as missing.  The second thing you can change is the Missing Values property to something other than Use in search.  This will prevent a branch from ever having only missing values (true missings and those defined by Min Cat Size).  Hope that helps!

View solution in original post


All Replies
Solution
‎10-17-2014 02:33 PM
SAS Super FREQ
Posts: 306

Re: Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation

I think what is happening here is that categories with less than the value specified for the Decision Tree node property Minimum Categorical Size are treated as missing, so that's why you are seeing that branch for GOVERNING_CLASS even though it has no missing values.  So one option is to change (lower) that value so categories with extremely small numbers are not treated as missing.  The second thing you can change is the Missing Values property to something other than Use in search.  This will prevent a branch from ever having only missing values (true missings and those defined by Min Cat Size).  Hope that helps!

Community Manager
Posts: 567

Re: Missings Showing Up in the Strangest of Places - Decision Trees, Consolidation

Welcome to the community, Zach! I hope you find some good advice in this forum. Keep the questions coming!

Anna

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 486 views
  • 1 like
  • 3 in conversation