BookmarkSubscribeRSS Feed
to_the_point
Calcite | Level 5

does anybody have a macro to bin 100's of character variables based on weight of evidence - I have lots of variants for numerical variables but wondered whether anyone had anything for character variables - otherwise I'm going to have spend lots of time trying to develop one (which with my skills could be weeks!).

thanks in advance!

3 REPLIES 3
M_Maldonado
Barite | Level 11

This sound like a task form Enterprise Miner. Using the Interactive Grouping Node (under the Credit Scoring panel) you will be able to bin variables (character or numeric) according to weight of evidence. I read on your tags information value, and optimal binning. You have those options too, as well as weight of evidence monotonicity, which is the greatest thing a binning can have.

To build a macro for grouping will probably not be the challenge. The criterion to iterate for an optimal solution is the real caveat. If you come up with a concrete definition of your ideal grouping, the macro will be the least of your worries. As a better way to invest your time, I would suggest try one of the following, they both support any level of input variable:

Your own Smart Groups without IGN

If for any reason you do not want to use EM IGN, the next best thing to emulate one of the most basic features would be to use EM to create many trees (decision tree node) or clusters (cluster node). Try many properties and benchmark which grouping makes more sense for your data... eg branches, depth /  ward method vs k-nearest neighbors, etc.

Your own Smart Groups without EM

If for any reason you do not want to use EM at all, then try proc arboretum or proc cluster to create your own trees or clusters.

Any similar approach is very likely to output a really smart grouping, very close to the optimal solution provided by IGN. You may need to build several of them and benchmark which one is giving you a better/more monotonic/better missclassification, etc.

I hope it helps.

Good luck, let me know how it goes!

Miguel

to_the_point
Calcite | Level 5

Thanks Miguel, unfortunately i don't have EM ( and hence arboretum) so it looks as if i'm stuck with proc cluster (which i've not used before) - so I'll be using sas support/online docs for a while - thanks again!  cheers 

zilok
Calcite | Level 5

I would suggest to create proc format dynamically. Please read proc format into plain text and use %include.

I know this is not the to_the_point solution.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2268 views
  • 0 likes
  • 3 in conversation