BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
LuigixD
Calcite | Level 5

 

Hello,
I am currently exploring the use of PROC OPTBINNING in SAS Viya and referring to the official documentation:
(https://documentation.sas.com/doc/en/casml/v_041/casml_optbinning_overview.htm). I have a couple of questions regarding the input parameters:

proc optbinning

data=mylib.datain 

/* It is necessary to generate this table with specific data , includes the bin number, number of bad observations, number of good observations, weight of evidence (WOE), variable name, lower bound, upper bound, and WOE trend */

param=mylib.parms 

/* It is necessary to generate this table with specific data , each row contains the constraints for each characteristic variable.
 */
output=mylib.outdata
status=mylib.status
adjustfactor=0.2;
run;

casml,ca

1. Regarding "woeTrend" in the "datain" table:


The documentation explains "woeTrend" as the change in WOE between consecutive bins, which implies that the first bin of each variable should not have a previous value to compare with  and theoretically, its "woeTrend " should be missing (null).
However, in the documentation example, the first bin does have a non-null value, and when I pass my table with "woeTrend = ." in the first bin, the procedure fails.
What is the correct way to calculate and populate "woeTrend" in the input table to ensure compatibility with PROC OPTBINNING? Is there an additional interpretation or preprocessing step I should be aware of?

LuigixD_0-1754012209208.png

 

2. Is there any built-in procedure in CASML that automatically builds the two required input tables ("datain" and "parms")?


I’ve written a macro that generates both tables, but I’m not sure if this is the recommended or most appropriate approach.
I still believe that at least the `datain` table should be possible to generate within PROC OPTBINNING itself, using the training data and some binning parameters, rather than having to prepare everything externally.

 

3. Regarding the preparation of the inputs ("datain" and "parms") :


The procedure requires two pre-aggregated tables with a specific format, instead of directly accepting raw training data, as other optimal binning libraries do in different environments.
I have developed a macro that generates the datain and parms tables, but I’m not completely sure if this is the most appropriate or recommended approach.

Is there any procedure in SAS/CASML that allows building these inputs internally from a variable and its target (or at least automates the creation of datain)?
From my perspective, at least the datain table which summarizes the fine bins should be generated within the same PROC OPTBINNING using a few parameters, without having to build it separately.

Additionally, what would be the best way to construct the datain table considering different situations? Currently, I use PROC RANK to group and then calculate the WOE values. Since PROC OPTBINNING does not accept missing values, I group them into a special bin (for example, by assigning the value -99999). However, the procedure requires low and high values for each bin, which creates a complication: in the case of the missing value bin, those limits do not exist.
What is the correct way to represent that missing value bin in the datain table to avoid errors? On the other hand, apparently categorical ones are not accepted.

 

I would really appreciate any guidance on how to properly prepare these inputs, and whether there is a more integrated or automated way to handle this within SAS.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ryan_H
SAS Employee

Greetings,

 

Hopefully I can be of some help! Proc optbinning is primarily used internally in the Interactive Grouping node within Model Studio, and thus information surrounding it can be a bit cryptic, as you’ve found. Maybe getting some information on how we prepare the data and use the proc will be beneficial.

 

  1. Indeed as you’ve found, optbinning does not want missing values in the woeTrend column. I can confirm this - it is even how it was in pre-Viya SAS. It does seem a bit confusing based on what the column represents. I think as long as you have a sane value in that first bin, that corresponds to the monotonic trend of the woe across the bins you are fine. One way of handling this would be to use a regression to estimate the trend of the WOE across the bins, and use a predicted value for woeTrend for all bins (which would give you a sane value for that first bin). You would treat the woe as the dependent variable and use the lower bound of the bin or the bin number as the independent variable.
  2. (and 3). Unfortunately no there are no builtin procedures to facilitate building datain and/or parms. The most expected way of building these tables is to start with the output of something like proc binning which gives you the bin information, calculate the woe, and go from there. It seems like you are already doing this so you are on the right track. And w.r.t. bins for missing values, we eliminate these from datain before call the proc - I don't think proc optbinning can handle the missing value bin.

Hope this helps!

 

View solution in original post

2 REPLIES 2
Ryan_H
SAS Employee

Greetings,

 

Hopefully I can be of some help! Proc optbinning is primarily used internally in the Interactive Grouping node within Model Studio, and thus information surrounding it can be a bit cryptic, as you’ve found. Maybe getting some information on how we prepare the data and use the proc will be beneficial.

 

  1. Indeed as you’ve found, optbinning does not want missing values in the woeTrend column. I can confirm this - it is even how it was in pre-Viya SAS. It does seem a bit confusing based on what the column represents. I think as long as you have a sane value in that first bin, that corresponds to the monotonic trend of the woe across the bins you are fine. One way of handling this would be to use a regression to estimate the trend of the WOE across the bins, and use a predicted value for woeTrend for all bins (which would give you a sane value for that first bin). You would treat the woe as the dependent variable and use the lower bound of the bin or the bin number as the independent variable.
  2. (and 3). Unfortunately no there are no builtin procedures to facilitate building datain and/or parms. The most expected way of building these tables is to start with the output of something like proc binning which gives you the bin information, calculate the woe, and go from there. It seems like you are already doing this so you are on the right track. And w.r.t. bins for missing values, we eliminate these from datain before call the proc - I don't think proc optbinning can handle the missing value bin.

Hope this helps!

 

LuigixD
Calcite | Level 5

Thank you very much for the recommendations. Now it is clear to me how the "PROC OPTBINNING" works.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 868 views
  • 3 likes
  • 2 in conversation