I am near done with a big model we have been working on. Unfortunately in doing some last-minute checks I saw that one of the variables used in my Decision Tree was not correctly coded. All the zeros for the particular variable should instead just be missing.
I am in the middle of looking at maybe using the Code Node to fix this. Or maybe a Metadata Node. But can anyone suggest a quick & dirty way to code the zeros into missings after I have already done a Data Partition? Please let me know the steps.
Thank you.
Zachery,
You can use the Replacement or Transform Variables node to replace your 0's with missings.
Steps
for example:
if age=0 then age=. ;
4. Click OK, then run your flow.
Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!
Good luck,
M
Hi,
I am sorry to admit that I am entirely unfamiliar with Enterprise Miner but you should be able to set any variable, numeric or character, to missing using the CALL MISSING routine.
It's a simple line of code: call missing(var);
Or conditionally if var=0 then call missing(var)
The CALL MISSING routine assigns missing values to the specified character or numeric variables: SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition
Zachery,
You can use the Replacement or Transform Variables node to replace your 0's with missings.
Steps
for example:
if age=0 then age=. ;
4. Click OK, then run your flow.
Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!
Good luck,
M
Yes, a variation I did of that worked. I did a HP Transform similarly to what you suggest there.
Is there an appreciable difference between a regular Transform versus a HP version?
You will notice the biggest difference between Transform and HP Transform if you have a grid setup. If you have EM licensed for a grid environment, you should use HP nodes only as regular nodes do not handle all your data set. You can mix and match HP and non HP nodes but it gets tricky at some point.
If you are just running on one machine or server, HP Transform takes advantage of all the cores in your machine. So there is an advantage of using it!
Summarizing, HP Transform has better performance over Transform node. If you are in a grid environment, do your best to stick to only HP nodes!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.