Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Easiest and Fastest Way to Remove a Value From a Variable

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 115
Accepted Solution

Easiest and Fastest Way to Remove a Value From a Variable

I am near done with a big model we have been working on. Unfortunately in doing some last-minute checks I saw that one of the variables used in my Decision Tree was not correctly coded. All the zeros for the particular variable should instead just be missing.

I am in the middle of looking at maybe using the Code Node to fix this. Or maybe a Metadata Node. But can anyone suggest a quick & dirty way to code the zeros into missings after I have already done a Data Partition? Please let me know the steps.

Thank you.


Accepted Solutions
Solution
‎01-01-2015 12:16 PM
Super Contributor
Posts: 336

Re: Easiest and Fastest Way to Remove a Value From a Variable

Zachery,

You can use the Replacement or Transform Variables node to replace your 0's with missings.

Steps

  1. Drag a Transform Variables node (Modify tab) and connect it to your flow
  2. Click on your Transform Vars node, then click on the SAS Code ellipsis under the Train properties.
  3. Code a single statement (just one line of code, you don't need a data or step statements).

          for example:

                   if age=0 then age=. ;

4. Click OK, then run your flow.

Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!

Good luck,

M

View solution in original post


All Replies
Contributor
Posts: 45

Re: Easiest and Fastest Way to Remove a Value From a Variable

Hi,

I am sorry to admit that I am entirely unfamiliar with Enterprise Miner but you should be able to set any variable, numeric or character, to missing using the CALL MISSING routine.

It's a simple line of code: call missing(var);

Or conditionally if var=0 then call missing(var)

The CALL MISSING routine assigns missing values to the specified character or numeric variables: SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

Solution
‎01-01-2015 12:16 PM
Super Contributor
Posts: 336

Re: Easiest and Fastest Way to Remove a Value From a Variable

Zachery,

You can use the Replacement or Transform Variables node to replace your 0's with missings.

Steps

  1. Drag a Transform Variables node (Modify tab) and connect it to your flow
  2. Click on your Transform Vars node, then click on the SAS Code ellipsis under the Train properties.
  3. Code a single statement (just one line of code, you don't need a data or step statements).

          for example:

                   if age=0 then age=. ;

4. Click OK, then run your flow.

Compare your results before and after recoding. One of the popular options (also default?) for decision tree is to handle missing values by "Use in search" which means that the decision tree will decide if it is best to keep the missings as a different level or to add them to one of the branches if it is more appropriate. For this reason I would not expect your DT model to be that much different before or after recoding missing values... unless there is some separation in your data that missings truly represent. Try both ways and keep what makes more sense for your business!

Good luck,

M

Frequent Contributor
Posts: 115

Re: Easiest and Fastest Way to Remove a Value From a Variable

Yes, a variation I did of that worked. I did a HP Transform similarly to what you suggest there.

Is there an appreciable difference between a regular Transform versus a HP version?

Super Contributor
Posts: 336

Re: Easiest and Fastest Way to Remove a Value From a Variable

You will notice the biggest difference between Transform and HP Transform if you have a grid setup. If you have EM licensed for a grid environment, you should use HP nodes only as regular nodes do not handle all your data set. You can mix and match HP and non HP nodes but it gets tricky at some point.

If you are just running on one machine or server, HP Transform takes advantage of all the cores in your machine. So there is an advantage of using it!

Summarizing, HP Transform has better performance over Transform node. If you are in a grid environment, do your best to stick to only HP nodes!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 517 views
  • 6 likes
  • 3 in conversation