Solved: Easiest and Fastest Way to Remove a Value From a Variable

Zachary · Posted 12-31-2014 09:04 AM

I am near done with a big model we have been working on. Unfortunately in doing some last-minute checks I saw that one of the variables used in my Decision Tree was not correctly coded. All the zeros for the particular variable should instead just be missing.

I am in the middle of looking at maybe using the Code Node to fix this. Or maybe a Metadata Node. But can anyone suggest a quick & dirty way to code the zeros into missings after I have already done a Data Partition? Please let me know the steps.

Thank you.

M_Maldonado · Posted 01-01-2015 12:16 PM

Zachery,

You can use the Replacement or Transform Variables node to replace your 0's with missings.

Steps

Drag a Transform Variables node (Modify tab) and connect it to your flow
Click on your Transform Vars node, then click on the SAS Code ellipsis under the Train properties.
Code a single statement (just one line of code, you don't need a data or step statements).

for example:

if age=0 then age=. ;

4. Click OK, then run your flow.

Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!

Good luck,

M

View solution in original post

MaikH_Schutze · Posted 12-31-2014 10:11 AM

Hi,

I am sorry to admit that I am entirely unfamiliar with Enterprise Miner but you should be able to set any variable, numeric or character, to missing using the CALL MISSING routine.

It's a simple line of code: call missing(var);

Or conditionally if var=0 then call missing(var)

The CALL MISSING routine assigns missing values to the specified character or numeric variables: SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

M_Maldonado · Posted 01-01-2015 12:16 PM

Zachery,

You can use the Replacement or Transform Variables node to replace your 0's with missings.

Steps

Drag a Transform Variables node (Modify tab) and connect it to your flow
Click on your Transform Vars node, then click on the SAS Code ellipsis under the Train properties.
Code a single statement (just one line of code, you don't need a data or step statements).

for example:

if age=0 then age=. ;

4. Click OK, then run your flow.

Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!

Good luck,

M

Zachary · Posted 01-02-2015 09:10 AM

Yes, a variation I did of that worked. I did a HP Transform similarly to what you suggest there.

Is there an appreciable difference between a regular Transform versus a HP version?

M_Maldonado · Posted 01-05-2015 04:41 PM

You will notice the biggest difference between Transform and HP Transform if you have a grid setup. If you have EM licensed for a grid environment, you should use HP nodes only as regular nodes do not handle all your data set. You can mix and match HP and non HP nodes but it gets tricky at some point.

If you are just running on one machine or server, HP Transform takes advantage of all the cores in your machine. So there is an advantage of using it!

Summarizing, HP Transform has better performance over Transform node. If you are in a grid environment, do your best to stick to only HP nodes!

Easiest and Fastest Way to Remove a Value From a Variable

Re: Easiest and Fastest Way to Remove a Value From a Variable

Re: Easiest and Fastest Way to Remove a Value From a Variable

Re: Easiest and Fastest Way to Remove a Value From a Variable

Re: Easiest and Fastest Way to Remove a Value From a Variable

Re: Easiest and Fastest Way to Remove a Value From a Variable

Catch up on SAS Innovate 2026