BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Zachary
Obsidian | Level 7

I am near done with a big model we have been working on. Unfortunately in doing some last-minute checks I saw that one of the variables used in my Decision Tree was not correctly coded. All the zeros for the particular variable should instead just be missing.

I am in the middle of looking at maybe using the Code Node to fix this. Or maybe a Metadata Node. But can anyone suggest a quick & dirty way to code the zeros into missings after I have already done a Data Partition? Please let me know the steps.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

Zachery,

You can use the Replacement or Transform Variables node to replace your 0's with missings.

Steps

  1. Drag a Transform Variables node (Modify tab) and connect it to your flow
  2. Click on your Transform Vars node, then click on the SAS Code ellipsis under the Train properties.
  3. Code a single statement (just one line of code, you don't need a data or step statements).

          for example:

                   if age=0 then age=. ;

4. Click OK, then run your flow.

Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!

Good luck,

M

View solution in original post

4 REPLIES 4
MaikH_Schutze
Quartz | Level 8

Hi,

I am sorry to admit that I am entirely unfamiliar with Enterprise Miner but you should be able to set any variable, numeric or character, to missing using the CALL MISSING routine.

It's a simple line of code: call missing(var);

Or conditionally if var=0 then call missing(var)

The CALL MISSING routine assigns missing values to the specified character or numeric variables: SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

M_Maldonado
Barite | Level 11

Zachery,

You can use the Replacement or Transform Variables node to replace your 0's with missings.

Steps

  1. Drag a Transform Variables node (Modify tab) and connect it to your flow
  2. Click on your Transform Vars node, then click on the SAS Code ellipsis under the Train properties.
  3. Code a single statement (just one line of code, you don't need a data or step statements).

          for example:

                   if age=0 then age=. ;

4. Click OK, then run your flow.

Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!

Good luck,

M

Zachary
Obsidian | Level 7

Yes, a variation I did of that worked. I did a HP Transform similarly to what you suggest there.

Is there an appreciable difference between a regular Transform versus a HP version?

M_Maldonado
Barite | Level 11

You will notice the biggest difference between Transform and HP Transform if you have a grid setup. If you have EM licensed for a grid environment, you should use HP nodes only as regular nodes do not handle all your data set. You can mix and match HP and non HP nodes but it gets tricky at some point.

If you are just running on one machine or server, HP Transform takes advantage of all the cores in your machine. So there is an advantage of using it!

Summarizing, HP Transform has better performance over Transform node. If you are in a grid environment, do your best to stick to only HP nodes!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2297 views
  • 6 likes
  • 3 in conversation