BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Zachary
Obsidian | Level 7

We are building a very basic Decision Tree - love EM too! We have the four following nodes:

Input Data,

Data Partition,

Consolidation Tree, then

Decision Tree.

The Consolidation Tree is actually a variation of a Decision Tree where we are taking some variables with many nominal categories (sometimes in the hundreds) and seeing we can relate them into simplified groupings to our main dependent/target variable. Below is is a snapshot of part of our Consolidation Tree:

Capture.PNG

The origin node looks great and we split our data 80/20 between training/validation. The first significant level is called NAC_CODE and it grouped the variable into two nice nodes. But the next level down for one of the nodes splits GOVERNING_CLASS into two nodes again - problem is one of them is a node for Missing_Values_Only. I normally would not be too concerned as many of the variables within our dataset have missing values. But GOVERNING_CLASS has zero. I fully understand how EM automatically groups the missing with other values of response for varying nodes even when there might be none in the present dataset for scoring purposes, but this does not make sense at all to be by itself.

Please help. I have some other questions coming after this one is resolved as well.

Thank you very much.

Zach Feinstein, Statistical Data Modeler

P (952) 838-4289 C (612) 590-4813  F (952) 838-2010

SFM Mutual Insurance Company

3500 American Blvd. W,
Suite 700, Bloomington, MN 55431

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

I think what is happening here is that categories with less than the value specified for the Decision Tree node property Minimum Categorical Size are treated as missing, so that's why you are seeing that branch for GOVERNING_CLASS even though it has no missing values.  So one option is to change (lower) that value so categories with extremely small numbers are not treated as missing.  The second thing you can change is the Missing Values property to something other than Use in search.  This will prevent a branch from ever having only missing values (true missings and those defined by Min Cat Size).  Hope that helps!

View solution in original post

2 REPLIES 2
WendyCzika
SAS Employee

I think what is happening here is that categories with less than the value specified for the Decision Tree node property Minimum Categorical Size are treated as missing, so that's why you are seeing that branch for GOVERNING_CLASS even though it has no missing values.  So one option is to change (lower) that value so categories with extremely small numbers are not treated as missing.  The second thing you can change is the Missing Values property to something other than Use in search.  This will prevent a branch from ever having only missing values (true missings and those defined by Min Cat Size).  Hope that helps!

AnnaBrown
Community Manager

Welcome to the community, Zach! I hope you find some good advice in this forum. Keep the questions coming!

Anna


Join us for SAS Community Trivia
SAS Bowl XXIX, The SAS Hackathon
Wednesday, March 8, 2023, at 10 AM ET | #SASBowl

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1278 views
  • 1 like
  • 3 in conversation