Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- SAS Data Science
- /
- Binning and Pre-Binning in Interactive Grouping

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-04-2020 08:03 AM
(1320 views)

Hello all, I'm exploring the use of interactive grouping in SaS EMiner as a method to bin the values of interval characteristic and wish to ask about the pre-binning process. Why do we need to use the quantile or bucket method to pre-bin the interval variable values rather than apply Tree-based binning to the interval values directly?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Someone from SAS may be able to provide a more accurate respose, but, as far as I know, the algorithm behind the Interactive Grouping Node uses a two-step approach for interval variables:

1. First, it "discretizes" the variables by creating groups, essentially transforming the variables from interval to nominal

2. Secondly, applies a Tree-based logic to find the optimal binning based on the groups from step (1)

My understanding is the above approach is used only for computational efficiency reasons, because, in general, interval variables may have hundreds, if not, thousands of different values whihc would make it too computational intensive for a Tree algorithm to fully evaluate.

Therefore, by carrying out a pre-binning step, you end up with far fewer categories which then can be optimised based on a Tree-like algorithm.

Lastly, from my experience, unless you have a good reason for using "bucket", my advice is to always go for "quantile" (i.e. that should be the default approach unless, for some specific reason, you want to have groups defined by having the same width).

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

quantile or bucket method are simple and easy to use.

If you are using Credit ScoreCard ,Tree-based binning can't guarantee the woe is monotonic .

If you are using Credit ScoreCard ,Tree-based binning can't guarantee the woe is monotonic .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Ksharp. If I understand correctly we can use either quantile, bucket OR tree method for binning? Is that correct?

The documentation states that quantile/bucket binning is a pre-bin stage before a Tree based method can be applied:

"The Interactive Grouping node first performs binning on the interval characteristic. You can choose between two binning methods: quantile and bucket. The quantile method generates groups. The groups are formed by ranked quantities with approximately the same frequency in each group. The bucket method generates groups by dividing the data into evenly spaced intervals that are based on the difference between the maximum and minimum values.After the interval variables have been pre-binned, a decision tree model is fitted for each characteristic. "

So is tree binning a sequential process starting with quantile/bucket pre-binning or we can use quantile, bucket and tree as alternative binning methods?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think quantile, bucket and tree are just three bin methods , you can use one of them .

Someone more like Tree , Someone more like quantile.

You could bin many groups like 20 by quantile, bucket method, and merge any two groups into one group to make Chisquare or Gini max , and so on , I think that is a tree method.

Someone more like Tree , Someone more like quantile.

You could bin many groups like 20 by quantile, bucket method, and merge any two groups into one group to make Chisquare or Gini max , and so on , I think that is a tree method.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Someone from SAS may be able to provide a more accurate respose, but, as far as I know, the algorithm behind the Interactive Grouping Node uses a two-step approach for interval variables:

1. First, it "discretizes" the variables by creating groups, essentially transforming the variables from interval to nominal

2. Secondly, applies a Tree-based logic to find the optimal binning based on the groups from step (1)

My understanding is the above approach is used only for computational efficiency reasons, because, in general, interval variables may have hundreds, if not, thousands of different values whihc would make it too computational intensive for a Tree algorithm to fully evaluate.

Therefore, by carrying out a pre-binning step, you end up with far fewer categories which then can be optimised based on a Tree-like algorithm.

Lastly, from my experience, unless you have a good reason for using "bucket", my advice is to always go for "quantile" (i.e. that should be the default approach unless, for some specific reason, you want to have groups defined by having the same width).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, that is all correct!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.