BookmarkSubscribeRSS Feed
Lbind91
Calcite | Level 5

Hey

 

I have a dataset where some of the input variables has a lot of levels. E.g., School_city with 513 levels and School_state with 49 levels. How can I reduce the number of levels in an input variable, or in some way group levels together in SAS Enterprise Miner?

 

I'm kind of new to SAS EM, so I need some help figuring this out.

 

Thanks.

 

/LB

2 REPLIES 2
Reeza
Super User

Usually I'd say use some sort of binning but you have ordinal variables. 

You need to group them in some logical manner ideally, city into states for example. Though this then becomes redundant. Or some sort of spatial relationship - especially if you're wanting to be able to interpret the results afterwards. 

 

If you're looking for some sort of rules to create these groups it sort of becomes a data mining problem in itself, using decision trees or clustering is one method. 

 

 

Lbind91
Calcite | Level 5

Thanks for the input, Reeza

 

I tried using a decision tree to consolidate the levels. It worked for the variables seperately and grouped the levels. But then I cannot figure out how to get the 5 different consolidated trees into the data again? 

The outputs from the separate decision trees are just _NODE_ for the new variables derived. So can I change the name, so they wont have the same name?

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1267 views
  • 0 likes
  • 2 in conversation