05-25-2017 09:56 AM
I have a dataset where some of the input variables has a lot of levels. E.g., School_city with 513 levels and School_state with 49 levels. How can I reduce the number of levels in an input variable, or in some way group levels together in SAS Enterprise Miner?
I'm kind of new to SAS EM, so I need some help figuring this out.
05-25-2017 10:18 AM
Usually I'd say use some sort of binning but you have ordinal variables.
You need to group them in some logical manner ideally, city into states for example. Though this then becomes redundant. Or some sort of spatial relationship - especially if you're wanting to be able to interpret the results afterwards.
If you're looking for some sort of rules to create these groups it sort of becomes a data mining problem in itself, using decision trees or clustering is one method.
05-25-2017 10:28 AM
Thanks for the input, Reeza
I tried using a decision tree to consolidate the levels. It worked for the variables seperately and grouped the levels. But then I cannot figure out how to get the 5 different consolidated trees into the data again?
The outputs from the separate decision trees are just _NODE_ for the new variables derived. So can I change the name, so they wont have the same name?