turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Interactive Grouping - documentation

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-16-2017 09:51 AM

Hello,

Could anybody help me with finding the documentation for the Interactive Grouping node in the SAS miner? I am interested in details, like how does the algorithm work precisely. As far as I know, the continuous variable is first binned, after which a decision tree is applied. In particular, I am interested in the decision tree algorithm applied? Also, is the decision tree run on the allready transformed WOE values, or it is run on the original variable?

Finally, might be a long shot but: Is it possible to interact this method with other variables to get interaction?

Best,

Marin

Accepted Solutions

Solution

01-17-2017
07:20 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Gokirop

01-16-2017 12:50 PM

The EM Reference Help for this node (under SAS Credit Scoring) provides a good amount of detail. You are correct that the continuous (interval) inputs are first binned via bucket or quantile binning, then those bins are further grouped using either PROC ARBOR or PROC OPTBIN (information about the constrained optimal binning here: http://www2.sas.com/proceedings/forum2008/153-2008.pdf), both using the bins themselves, not the WOE for the bins.

Here is a paragraph from the Reference Help that might be useful:

After the interval variables have been pre-binned, a decision tree model is fitted for each characteristic. PROC ARBOR or PROC OPTBIN (if constrained optimal) is used to produce the groups. You can choose among four grouping methods: optimal criterion, quantile, monotonic event rate, and constrained optimal. The optimal criterion method uses one of two criteria: reduction in entropy measure or the p-value of the Pearson Chi-square statistic. The quantile method generates groups with approximately the same frequency in each group. The monotonic event rate method generates groups that result in a monotonic distribution of event rates across all attributes. The event rate is equal to P(event | attribute). This is the conditional probability of an event given that an applicant exhibits a particular attribute. The constrained optimal method finds an optimal set of groups and simultaneously imposes additional constraints, as specified in the node property panel settings.

All Replies

Solution

01-17-2017
07:20 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Gokirop

01-16-2017 12:50 PM

The EM Reference Help for this node (under SAS Credit Scoring) provides a good amount of detail. You are correct that the continuous (interval) inputs are first binned via bucket or quantile binning, then those bins are further grouped using either PROC ARBOR or PROC OPTBIN (information about the constrained optimal binning here: http://www2.sas.com/proceedings/forum2008/153-2008.pdf), both using the bins themselves, not the WOE for the bins.

Here is a paragraph from the Reference Help that might be useful:

After the interval variables have been pre-binned, a decision tree model is fitted for each characteristic. PROC ARBOR or PROC OPTBIN (if constrained optimal) is used to produce the groups. You can choose among four grouping methods: optimal criterion, quantile, monotonic event rate, and constrained optimal. The optimal criterion method uses one of two criteria: reduction in entropy measure or the p-value of the Pearson Chi-square statistic. The quantile method generates groups with approximately the same frequency in each group. The monotonic event rate method generates groups that result in a monotonic distribution of event rates across all attributes. The event rate is equal to P(event | attribute). This is the conditional probability of an event given that an applicant exhibits a particular attribute. The constrained optimal method finds an optimal set of groups and simultaneously imposes additional constraints, as specified in the node property panel settings.