Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Automatically collapsing levels of a categorical variable in SAS EM

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

Automatically collapsing levels of a categorical variable in SAS EM

[ Edited ]

Hi everyone,

Is there a way of automatically collapsing levels of a categorical variable in SAS Enterprise miner (I dont want to do it using the replacement editor as this is a manual approach).

 

One way of doiing this in SAS enterprise guide is to use the greenacre's method. This collapses levels that lead to the least reduction in the chis square statistics, thereby leading to a resulting categorical variable that has a strong relationship to the target.

 

Could greenacre's method be performed in SAS Miner?

Thanks


Accepted Solutions
Solution
‎07-03-2017 06:13 PM
SAS Employee
Posts: 6

Re: Automatically collapsing levels of a categorical variable in SAS EM

The attached code contains a SAS program that can be implemented in a SAS Code node in SAS Enterprise Miner. Use the code at your own risk. It represents an attempt to implement Greenacre's method to consolidate the levels of a categorical variable. You may also wish to consider using decision trees as described in Section 9.4 of the course, "Applied Analytics Using SAS Enterprise Miner."

View solution in original post

Attachment

All Replies
Solution
‎07-03-2017 06:13 PM
SAS Employee
Posts: 6

Re: Automatically collapsing levels of a categorical variable in SAS EM

The attached code contains a SAS program that can be implemented in a SAS Code node in SAS Enterprise Miner. Use the code at your own risk. It represents an attempt to implement Greenacre's method to consolidate the levels of a categorical variable. You may also wish to consider using decision trees as described in Section 9.4 of the course, "Applied Analytics Using SAS Enterprise Miner."

Attachment
Contributor
Posts: 38

Re: Automatically collapsing levels of a categorical variable in SAS EM

Posted in reply to TWoodfield

Hi TWoodfield,

Thanks for taking your time to write the code out. I am even more curious as to why I couldn't use a score code in the SAS CODE node in Miner. I'll check out the course your recommended, but for the meantime do you have any suggestions as to why the new variable I created did'nt show up in proceeding nodes?

 

In section 9.4 of 'Applied Analytics  Using Enterprise Miner', the tutor shows how to consolidate levels of a single variable using a decision tree. From that point, do I need to use the replacement editor to manually create the new levels based on the output of the decision tree? Or is there a way of applying those newly created levels automatically by maybe linking the node to another node? I did my predictive modeller certification a year ago but that was not in the content.

Thanks,

 

Paul

SAS Employee
Posts: 6

Re: Automatically collapsing levels of a categorical variable in SAS EM

The code I provided is a start to writing a complete extension node for SAS Enterprise Miner. You can modify the metadata to make the new variable an Input variable. The program actually creates score code, it is the code stored in the location identified by the EM_FLOW_EMFLOWSCORECODE macro variable. You can learn more about extension nodes by downloading "SAS Enterprise Miner Extension Nodes: Developer's Guide." Search on "SAS Extension Nodes Developer's Guide." The latest path seems to be

 

https://support.sas.com/documentation/cdl/en/emxndg/67980/PDF/default/emxndg.pdf

 

You can also learn more about extension nodes by taking the course "Extending SAS Enterprise Miner with User-Written Nodes." The documentation or the course will teach you about flow score code and publish score code.

 

If you consolidate levels using a Decision Tree node as described in the EM course, you change the leaf property to input, then the variable _NODE_ is exported as an input for subsequent nodes.

 

If done correctly, you do not need a Replacement node.

 

Note that the input variable created using my code is automatically given the name of the original variable with suffix "_clus" added. You can obviously change the code to do anything you like.

 

Regards,

Terry

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 570 views
  • 0 likes
  • 2 in conversation