Hello,
I am currently exploring the use of PROC OPTBINNING in SAS Viya and referring to the official documentation:
(https://documentation.sas.com/doc/en/casml/v_041/casml_optbinning_overview.htm). I have a couple of questions regarding the input parameters:
proc optbinning
data=mylib.datain
/* It is necessary to generate this table with specific data , includes the bin number, number of bad observations, number of good observations, weight of evidence (WOE), variable name, lower bound, upper bound, and WOE trend */
param=mylib.parms
/* It is necessary to generate this table with specific data , each row contains the constraints for each characteristic variable.
*/
output=mylib.outdata
status=mylib.status
adjustfactor=0.2;
run;
casml,ca
1. Regarding "woeTrend" in the "datain" table:
The documentation explains "woeTrend" as the change in WOE between consecutive bins, which implies that the first bin of each variable should not have a previous value to compare with and theoretically, its "woeTrend " should be missing (null).
However, in the documentation example, the first bin does have a non-null value, and when I pass my table with "woeTrend = ." in the first bin, the procedure fails.
What is the correct way to calculate and populate "woeTrend" in the input table to ensure compatibility with PROC OPTBINNING? Is there an additional interpretation or preprocessing step I should be aware of?
2. Is there any built-in procedure in CASML that automatically builds the two required input tables ("datain" and "parms")?
I’ve written a macro that generates both tables, but I’m not sure if this is the recommended or most appropriate approach.
I still believe that at least the `datain` table should be possible to generate within PROC OPTBINNING itself, using the training data and some binning parameters, rather than having to prepare everything externally.
3. Regarding the preparation of the inputs ("datain" and "parms") :
The procedure requires two pre-aggregated tables with a specific format, instead of directly accepting raw training data, as other optimal binning libraries do in different environments.
I have developed a macro that generates the datain and parms tables, but I’m not completely sure if this is the most appropriate or recommended approach.
Is there any procedure in SAS/CASML that allows building these inputs internally from a variable and its target (or at least automates the creation of datain)?
From my perspective, at least the datain table which summarizes the fine bins should be generated within the same PROC OPTBINNING using a few parameters, without having to build it separately.
Additionally, what would be the best way to construct the datain table considering different situations? Currently, I use PROC RANK to group and then calculate the WOE values. Since PROC OPTBINNING does not accept missing values, I group them into a special bin (for example, by assigning the value -99999). However, the procedure requires low and high values for each bin, which creates a complication: in the case of the missing value bin, those limits do not exist.
What is the correct way to represent that missing value bin in the datain table to avoid errors? On the other hand, apparently categorical ones are not accepted.
I would really appreciate any guidance on how to properly prepare these inputs, and whether there is a more integrated or automated way to handle this within SAS.
Greetings,
Hopefully I can be of some help! Proc optbinning is primarily used internally in the Interactive Grouping node within Model Studio, and thus information surrounding it can be a bit cryptic, as you’ve found. Maybe getting some information on how we prepare the data and use the proc will be beneficial.
Hope this helps!
Greetings,
Hopefully I can be of some help! Proc optbinning is primarily used internally in the Interactive Grouping node within Model Studio, and thus information surrounding it can be a bit cryptic, as you’ve found. Maybe getting some information on how we prepare the data and use the proc will be beneficial.
Hope this helps!
Thank you very much for the recommendations. Now it is clear to me how the "PROC OPTBINNING" works.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.