Hi!
I am hoping someone can provide me with a small amount of assistance using the Constrained Optimized Binning procedure in the Interactive Grouping Node of Credit Scoring for SAS® Enterprise Miner™. I wish to run a few brief experiments, but unfortunately, we don't license that product, and obtaining an evaluation copy is not a realistic option for us. I have coded the algorithm as described in the introductory SGF presentation, "SAS/OR®: Rigorous Constrained Optimized Binning for Credit Scoring," by Ivan Oliveira, Manoj Chari, Susan Haller (http://www2.sas.com/proceedings/forum2008/153-2008.pdf), and the patent application "Constrained optimized binning for scorecards" (https://www.google.com/patents/US8296224). I want to make sure that my results match with Enterprise Miner. I have a data set with a single predictor variable and a binary dependent variable. The original data contains about 30K records, but there are only 83 distinct values of the predictor, so I have aggregated the records into one record for each predictor value, with the count of records and the count of ones as separate fields. Naturally, the count of zeroes is just the count of records minus the count of ones; I have also precomputed the hit rate (count of ones / count of records), point log odds (count of ones / count of zeroes), and Weight of Evidence (point log odds minus total log odds), but I can provide whatever format is most convenient for use in Enterprise Miner, as a SAS data set, Excel spreadsheet, or .csv text file.
I've found some interesting features I'd like to verify. It appears that one of the main motivations for the design of this method was to allow binning with weight of evidence (WoE) monotonicity. The SGF paper describes both a method to achieve true monotonicity, and a simpler method to obtain surrogate monotonicity that is not guaranteed to be truly monotonic. Enterprise Miner offers only the true monotonicity option, not the surrogate option. But I have found examples in which the surrogate option solution achieves true monotonicity, with a higher information value (and / or chi-square value) than the solution produced by the method with guaranteed monotonicity. I would like to corroborate these results, and appreciate any assistance I can get to do so. If you can help, please respond, and I will provide the data in the format you request to run in the EM Interactive Grouping node.
Thanks!
Hi @Top_Katz,
It looks like this has been addressed in Tech Support tracks. Thank you for using the community!
Anna
Hi @AnnaBrown! Thank you so much for following up. I understand that not every query in the communities gets an answer. In this case, no one inside SAS was willing to run the test(s), and no one outside volunteered either. I was a bit disappointed by the lack of response, and still feel that such testing that might be useful to improve the product. But such is life...
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.