03-01-2018 09:39 AM
I've been given a task to do in work to implement optimal binning on SAS E-Guide. I'm a few weeks into learning SAS so forgive my slowness/stupidity. I'm trying to run any sort of an algorithm that will perform optimal binning on a single variable with respect to a target variable that may have 2 or more levels. The single variable can be nominal or numeric and may or may not contain missing values. Once the optimal binning algorithm has been ran, what I would like is to use this output to create a new feature, let's say called FEATURE_BIN, which will appear as an extra column on the dataset containing the original feature.
I've managed to create splits using the proc hpsplit and proc hpbin from other pieces of code found online but I've been unable to map these bins back to the original dataset. I also believe that SAS E-Miner contains an Optimal Binning node but I would prefer something that I could automate in base code and insert into a macro. I'd ideally would like to use these splits to be able to be used on different datasets, e.g. training, testing, scoring datasets for consistency purposes.
I found some sample data in the following link which aims to create optimal binning on the below data:
input cl1 $ x0 x1 x2 y $ freq id; cards; a 2 . 7 n 2 1 a 2 2 6 . 3 2 a 3 0 1 o 0 3 c 2 3 7 y . 4 c 2 . 4 n -5 5 a 3 6 7 n 3 6 b 1 4 4 y 4 7 b 2 5 6 y 3 8 b 1 6 4 o 1 9 b 2 3 2 n 3 10 run;
What I would like say if optimal binning was performed on a single variable, says x1 then there would be a new feature in the table called x1_bin containing the bin the x1 value belongs to.
Hope I've explained the problem ok, if not let me know and I'll be happy to help.