BookmarkSubscribeRSS Feed
SGhosh
Fluorite | Level 6

I am new to SAS EMiner, so any response on this would e very helpful

 

How can I select variables in the data node of SAS EMiner? To define it more clearly, I would say if I have ~50 variables, how could I select / determine the strongest variables for my model?

When I am running my model package, in the log I see "Pr > ChiSq" for most of the variables are 1.0000 - this explains my variables have some issue. But how to fix it. For example , I have a variable called claim_count and instead of keeping the values as continuous I grouped them in certain buckets. like 1-10,11-20.. etc

 

Thanks in advancre

2 REPLIES 2
DougWielenga
SAS Employee

How can I select variables in the data node of SAS EMiner? To define it more clearly, I would say if I have ~50 variables, how could I select / determine the strongest variables for my model?

 

By 'data node' I am assuming you mean the Input Data Source node.  In general, you would be ill-advised to remove variables from consideration unless you knew they were not (most likely) suitable for direct use in modeling (e.g. ID information, date/timestamp information, SKU numbers, zip codes, etc...).  It is also not necessary to choose variables in this node since SAS Enterprise Miner provides a wealth of methods to choose variables for your model such as the following:

     * the Variable Selection node provides Regression and Tree-based methods for choosing variables

     * the Tree node performs its own variable selection so it does not need prior variable selection

     * the Regression node allows you to add possible terms and to perform a set of stepwise methods to perform variable selection

     * the Variable Clustering node provides and alternate way of trying to remove variables which have duplicate or highly similar information

 

When I am running my model package, in the log I see "Pr > ChiSq" for most of the variables are 1.0000 - this explains my variables have some issue. But how to fix it. For example , I have a variable called claim_count and instead of keeping the values as continuous I grouped them in certain buckets. like 1-10,11-20.. etc

 

Bucketing manually without any numerical evaluation might actually do more harm than good.  SAS Enterprise Miner provides a variety of bucketing algorithms which can take into account the relationship to the target variable.  

      * the Transform Variables node allow you to create bucket with optimal relationship to the target variable (a Tree-based method)

      * Interactive Grouping allows you to create groups interactively

 

Please note that bucketing summarizes information and can (possibly) result in a predictor that is less capable than the original data.   The buckets, however, do provide the additional capability of helping to model non-linearity which might improve how the variable information can be used.  In practice, I recommend including both the original interval variable and the bucketed version prior to variable selection so that the information is used in the best possible way.

 

Hope this helps!

Doug

 

 

 

SGhosh
Fluorite | Level 6

This information is really helpful. I do appreciate this. I changed few things after reading the reponse and getting better result now.

Truly appreciate all your help

 

Thanks much

Soma

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 974 views
  • 2 likes
  • 2 in conversation