Programming the statistical procedures from SAS

weight of evidence for character variables

Posts: 29

weight of evidence for character variables

does anybody have a macro to bin 100's of character variables based on weight of evidence - I have lots of variants for numerical variables but wondered whether anyone had anything for character variables - otherwise I'm going to have spend lots of time trying to develop one (which with my skills could be weeks!).

thanks in advance!

Super Contributor
Posts: 336

Re: weight of evidence for character variables

This sound like a task form Enterprise Miner. Using the Interactive Grouping Node (under the Credit Scoring panel) you will be able to bin variables (character or numeric) according to weight of evidence. I read on your tags information value, and optimal binning. You have those options too, as well as weight of evidence monotonicity, which is the greatest thing a binning can have.

To build a macro for grouping will probably not be the challenge. The criterion to iterate for an optimal solution is the real caveat. If you come up with a concrete definition of your ideal grouping, the macro will be the least of your worries. As a better way to invest your time, I would suggest try one of the following, they both support any level of input variable:

Your own Smart Groups without IGN

If for any reason you do not want to use EM IGN, the next best thing to emulate one of the most basic features would be to use EM to create many trees (decision tree node) or clusters (cluster node). Try many properties and benchmark which grouping makes more sense for your data... eg branches, depth /  ward method vs k-nearest neighbors, etc.

Your own Smart Groups without EM

If for any reason you do not want to use EM at all, then try proc arboretum or proc cluster to create your own trees or clusters.

Any similar approach is very likely to output a really smart grouping, very close to the optimal solution provided by IGN. You may need to build several of them and benchmark which one is giving you a better/more monotonic/better missclassification, etc.

I hope it helps.

Good luck, let me know how it goes!


Posts: 29

Re: weight of evidence for character variables

Thanks Miguel, unfortunately i don't have EM ( and hence arboretum) so it looks as if i'm stuck with proc cluster (which i've not used before) - so I'll be using sas support/online docs for a while - thanks again!  cheers 

Occasional Contributor
Posts: 5

Re: weight of evidence for character variables

I would suggest to create proc format dynamically. Please read proc format into plain text and use %include.

I know this is not the to_the_point solution.

Ask a Question
Discussion stats
  • 3 replies
  • 3 in conversation