BookmarkSubscribeRSS Feed
sonia_qc
Calcite | Level 5

 

Hi SAS,

 

I have a variable ( which has more than 1000 category ( modality)). I want to group them.

 

how can I do it in SAS ENTERPRISE MINER?

 

thanks

Sonia

1 REPLY 1
DougWielenga
SAS Employee

I have a variable ( which has more than 1000 category ( modality)). I want to group them.

 

how can I do it in SAS ENTERPRISE MINER?

 

SAS Enterprise Miner can perform observational clustering (groups observations which are similar with respect to a set of variables) and variable clustering (groups variables together that have tend to vary together).   In your case, however, you are looking to group the levels within a single variable.   

 

To answer this, the first question to ask is on what basis do I want them grouped?   

 

For example, I could group them based on

  * cardinality (how many there are)

  * hierarchy (how similar they are with regards to some more general categorization)

  * response (how similar they are with regards to a particular outcome of interest)

 

Regarding using cardinality -- The Pareto principle often comes into play where 80% of the data is represented by 20% of the levels.  Looking at levels which occur commonly enough as their own category initially and then group the remaining infrequently occurring levels into one or more other categories.   Levels with too few observations have little impact on the full solution but can vary wildly so grouping them together reduces cardinality while providing a more stable solution.

 

Regarding using hierarchy -- If there are natural groupings of levels that makes sense, you might get a better solution using that hierarchy.  For example, suppose you wanted to group the SKU numbers in a grocery store.  You might look to higher levels of the SKU like grouping all the types of grapes together.  You could also look higher in the hierarchy and group all kinds of fruit or even all types of produce together.   Variables with a large number of levels can sometimes be better represented by multiple variables which represent different levels of the hierarchy.

 

Regarding using response -- If you want to group the levels using an outcome variable, then you can simply fit a Decision Tree using your response variable of interest as your target variable.  Each split the tree makes will parse the levels of your variable so that you can choose your final groupings based on the tree that you build.  

 

Hope this helps!

Doug

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 740 views
  • 0 likes
  • 2 in conversation