09-20-2012 11:18 AM
does anybody have a macro to bin 100's of character variables based on weight of evidence - I have lots of variants for numerical variables but wondered whether anyone had anything for character variables - otherwise I'm going to have spend lots of time trying to develop one (which with my skills could be weeks!).
thanks in advance!
09-20-2012 11:50 AM
This sound like a task form Enterprise Miner. Using the Interactive Grouping Node (under the Credit Scoring panel) you will be able to bin variables (character or numeric) according to weight of evidence. I read on your tags information value, and optimal binning. You have those options too, as well as weight of evidence monotonicity, which is the greatest thing a binning can have.
To build a macro for grouping will probably not be the challenge. The criterion to iterate for an optimal solution is the real caveat. If you come up with a concrete definition of your ideal grouping, the macro will be the least of your worries. As a better way to invest your time, I would suggest try one of the following, they both support any level of input variable:
Your own Smart Groups without IGN
If for any reason you do not want to use EM IGN, the next best thing to emulate one of the most basic features would be to use EM to create many trees (decision tree node) or clusters (cluster node). Try many properties and benchmark which grouping makes more sense for your data... eg branches, depth / ward method vs k-nearest neighbors, etc.
Your own Smart Groups without EM
If for any reason you do not want to use EM at all, then try proc arboretum or proc cluster to create your own trees or clusters.
Any similar approach is very likely to output a really smart grouping, very close to the optimal solution provided by IGN. You may need to build several of them and benchmark which one is giving you a better/more monotonic/better missclassification, etc.
I hope it helps.
Good luck, let me know how it goes!
09-20-2012 03:09 PM
Thanks Miguel, unfortunately i don't have EM ( and hence arboretum) so it looks as if i'm stuck with proc cluster (which i've not used before) - so I'll be using sas support/online docs for a while - thanks again! cheers