BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wave43
Obsidian | Level 7

I used the "filter" node to exclude specific values from my data set, and this works great. I click on Class Variables, and then select the values I want to exclude.

 

In that same dialog box, there is a "keep missing values" option. I select "No".  I then connect it to the "Decision Tree" Node. When I run that node, it does not use the values I excluded, but there is still a branch that has a label saying it used missing values. Example: ONe branch says "Female or Missing" for Gender.  There isn't even missing values for Gender, and it says it used them, and I turned off the missings in the filter node.

 

Suggestions?

 

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

Even if you don't have any missing values in your training data, the Decision Tree node is going to include missing values in the rules it creates in case missing values are encountered when scoring new data.  And you can specify how they should be handled with the Missing Values property of the Decision Tree node.  Hope that clears up any confusion!

 

View solution in original post

4 REPLIES 4
M_Maldonado
Barite | Level 11

Hi,

One of the main advantages of decision tree algorithms is that they handle missing values.

If you still want to exclude missing values, you are doing it the right way.

 

The reason you see a label for "or missing" is that it is part of the algorithm to assign missings (if there were any) to a specific branch. In other words, your decision tree is just telling you what it would do if you use this model to score a new data set that has that variable missing.

 

A quick way to confirm that your filter did exclude missings: Select your decision tree node, click on the ellipsis for Imported Data, and browse or explore this partition to confirm that the observations with missings were excluded. Personally I would not exclude observations with missing values, but you can certainly do that if that is your preference.

 

Good luck!

-Miguel

wave43
Obsidian | Level 7

Thanks, that is helpful.

WendyCzika
SAS Employee

Even if you don't have any missing values in your training data, the Decision Tree node is going to include missing values in the rules it creates in case missing values are encountered when scoring new data.  And you can specify how they should be handled with the Missing Values property of the Decision Tree node.  Hope that clears up any confusion!

 

wave43
Obsidian | Level 7
Thanks, helpful!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3540 views
  • 0 likes
  • 3 in conversation