Solved: RD-Tree algorithm in MBR node - SAS Support Communities

🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Quartz | Level 8

Quartz | Level 8

RD-Tree algorithm in MBR node

Posted 12-27-2014 01:14 PM (8779 views)

Dears,

I need to know how MBR node work in E-Miner ? what is the difference between (Scan, RD-tree) algorithms in MBR?

Please send me any support materials.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Barite | Level 11

Barite | Level 11

Re: RD-Tree algorithm in MBR node

Posted 12-30-2014 05:30 PM (8562 views) | In reply to husseinmazaar

As you know MBR is also called ""Nearest Neighbor", it assign a score to new data observation using the average of the K nearest neighbor.

You may return to EM Help "F1", to get more information about the MBR node.

This method works in multiple dimensions, but requires that the inputs have equal variances and no correlation.(return to a previous thread for more details).

Regarding your second Question about the difference between Scan and RD-tree:

-Scan: choose the nearest neighbor by the smallest linear squared distance between the observation and all possible k neighbors.

-RD-tree: use a tree structure to organize data observation in dimension space.

>> A brief description is mentioned in the EM help, and also it mention that the RD-tree works better than scan to dimensional of 100 .

>>Also the only referance i find mention RD-tree in more details is Data Mining Using SAS Enterprise Miner By Randall Matignon , which refer to RD-tree as improvement over the famous KD-tree algorithm and mention the differences between the two. But i find SAS refer to the RD-tree as the Patented Reduced Dimensionality Tree in the SAS Enterprise Miner Fact Sheet and i did find a lot about it.

If you are looking for case studies where MBR node were used i found such two papers:

Improving Credit Risk Scorecards with Memory-Based Reasoning to Reject. Inference with SAS. ®. Enter...

Prediction of Rise in Sea Level by Memory-Based Reasoning Model using SAS® Enterprise Miner™ 12.1, where scan and RD-tree used and results compared to each other.

View solution in original post

6 REPLIES 6

Barite | Level 11

Barite | Level 11

Re: RD-Tree algorithm in MBR node

Posted 12-30-2014 05:30 PM (8563 views) | In reply to husseinmazaar

As you know MBR is also called ""Nearest Neighbor", it assign a score to new data observation using the average of the K nearest neighbor.

You may return to EM Help "F1", to get more information about the MBR node.

This method works in multiple dimensions, but requires that the inputs have equal variances and no correlation.(return to a previous thread for more details).

Regarding your second Question about the difference between Scan and RD-tree:

-Scan: choose the nearest neighbor by the smallest linear squared distance between the observation and all possible k neighbors.

-RD-tree: use a tree structure to organize data observation in dimension space.

>> A brief description is mentioned in the EM help, and also it mention that the RD-tree works better than scan to dimensional of 100 .

>>Also the only referance i find mention RD-tree in more details is Data Mining Using SAS Enterprise Miner By Randall Matignon , which refer to RD-tree as improvement over the famous KD-tree algorithm and mention the differences between the two. But i find SAS refer to the RD-tree as the Patented Reduced Dimensionality Tree in the SAS Enterprise Miner Fact Sheet and i did find a lot about it.

If you are looking for case studies where MBR node were used i found such two papers:

Improving Credit Risk Scorecards with Memory-Based Reasoning to Reject. Inference with SAS. ®. Enter...

Prediction of Rise in Sea Level by Memory-Based Reasoning Model using SAS® Enterprise Miner™ 12.1, where scan and RD-tree used and results compared to each other.

Quartz | Level 8

Quartz | Level 8

Re: RD-Tree algorithm in MBR node

Posted 12-31-2014 02:57 AM (8155 views) | In reply to mohamed_zaki

Thanks mohamed zaki

Any comment RalphAbbey

SAS Employee

SAS Employee

Re: RD-Tree algorithm in MBR node

Posted 01-06-2015 10:33 AM (8155 views) | In reply to husseinmazaar

Mohamed made a bunch of good points. The EM help does describe generally how the RD-Tree works, while the Scan method is just and exhaustive search.

I do want to add that for the MBR Node, all of the inputs need to be interval. In addition, the k-nearest neighbors method (in general, not just the MBR Node) assumes that the variables are orthogonal and standardized, so we recommend using principal components prior to MBR unless you know that your variables are orthogonal and standardized. If you don't have orthogonal or standardized variables, then some variables will skew the k-nearest neighbors more than others (the node will still run, but your results may not be as useful as you would like).

I hope this helps.

Calcite | Level 5

Calcite | Level 5

Re: RD-Tree algorithm in MBR node

Posted 06-08-2015 07:50 PM (8155 views) | In reply to RalphAbbey

Hi, Ralph:

Do you know the difference between MBR node and PROC DISCRIM procedure? I used both to run my KNN but I got totally different result, but I don't know which one to use for scoring my new data set.

Thanks for your help.

Obsidian | Level 7

Obsidian | Level 7

Re: RD-Tree algorithm in MBR node

Posted 06-12-2015 02:34 PM (8155 views) | In reply to EricTsai

Hi,

This is a very good question. To echo the comments of many above, the basic difference between PROC DISCRIM in SAS/STAT and the MBR node in SAS Enterprise Miner is that the MBR node can use the RDTREE method to search for nearest neighbor observations with a known class in training data to classify new observations in test data. (The RDTREE method is a proprietary version of the popular KDTREE algorithm.) The tree-based neighbor search can be faster, but you may get different results between PROC DISCRIM and the MBR node.

If you want to have the closest correspondence between PROC DISCRIM and the MBR node, use to the SCAN option in the MBR node's METHOD property to request that the MBR node use a conventional distance calculation to classify new observations. However, the traditional distance calculation may be unsuited for big data.

Here is an example of how different your results could be in a low-dimensional simulated data set. (Your results could be even more different with real data.)

Notice that the classifications made by PROC DISCRIM and PROC PMBR (i.e. the MBR node) using the SCAN method are almost identical. Using the RDTREE method with a small EPSILON, you can also closely replicate the results of PROC DISCRIM on this sample data. But, if you change the value for EPSILON then your results can be noticeably different from PROC DISCRIM using its default settings.

Changing the BUCKETS property does not appear to change the classification results, but changing the EPSILON property does change the classification results. So what is going on here?

The BUCKETS property:

The value of the BUCKETS option should not affect classification results.
It is a parameter that is used to balance the speed vs. memory trade-off for the tree structure. It is basically the number of observations allowed to be in each node of the RDTREE.
Nodes of the tree are searched in O(log N) time; within each node the search for neighbors is slower. However building more nodes requires more memory.
A lower value for BUCKETS will result in a faster calculation, but more memory being used.
A higher value for BUCKETS will result in a slower calculation, but less memory being used.

The EPSILON property:

EPSILON controls the approximate nearest neighbor search; changing EPSILON can affect classification results.
A larger value for EPSILON will allow more points that may not be actual nearest neighbors to be used to classify a new observation.
A smaller value for EPSILON will use more points that are guaranteed to be nearest neighbors to classify a new observation.
Using a larger value for EPSILON should decrease execution time.

If you would like to try this experiment for yourself, the code is available here: PROC DISCRIM vs. the MBR node in Enterprise Miner

Message was edited by: Patrick Hall; updated code and added details for BUCKETS and EPSILON.

Calcite | Level 5

Calcite | Level 5

Re: RD-Tree algorithm in MBR node

Posted 06-12-2015 03:55 PM (8155 views) | In reply to PatrickHall

Thank you very much Patrick for sharing this information and specifically your code.

It really helps in explaining the difference between the output from PROC DISCRIM and MBR node.

744853-sas-innovate-data-ai (1).jpg

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand in the Innovate Hub.

Discussion stats

6 replies
‎12-27-2014 01:14 PM
8780 views
2 likes
5 in conversation