BookmarkSubscribeRSS Feed
Lbind91
Calcite | Level 5

Hey

 

I have a dataset where some of the input variables has a lot of levels. E.g., School_city with 513 levels and School_state with 49 levels. How can I reduce the number of levels in an input variable, or in some way group levels together in SAS Enterprise Miner?

 

I'm kind of new to SAS EM, so I need some help figuring this out.

 

Thanks.

 

/LB

2 REPLIES 2
Reeza
Super User

Usually I'd say use some sort of binning but you have ordinal variables. 

You need to group them in some logical manner ideally, city into states for example. Though this then becomes redundant. Or some sort of spatial relationship - especially if you're wanting to be able to interpret the results afterwards. 

 

If you're looking for some sort of rules to create these groups it sort of becomes a data mining problem in itself, using decision trees or clustering is one method. 

 

 

Lbind91
Calcite | Level 5

Thanks for the input, Reeza

 

I tried using a decision tree to consolidate the levels. It worked for the variables seperately and grouped the levels. But then I cannot figure out how to get the 5 different consolidated trees into the data again? 

The outputs from the separate decision trees are just _NODE_ for the new variables derived. So can I change the name, so they wont have the same name?

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1455 views
  • 0 likes
  • 2 in conversation