12-30-2014 05:30 PM
As you know MBR is also called ""Nearest Neighbor", it assign a score to new data observation using the average of the K nearest neighbor.
You may return to EM Help "F1", to get more information about the MBR node.
This method works in multiple dimensions, but requires that the inputs have equal variances and no correlation.(return to a previous thread for more details).
Regarding your second Question about the difference between Scan and RD-tree:
-Scan: choose the nearest neighbor by the smallest linear squared distance between the observation and all possible k neighbors.
-RD-tree: use a tree structure to organize data observation in dimension space.
>> A brief description is mentioned in the EM help, and also it mention that the RD-tree works better than scan to dimensional of 100 .
>>Also the only referance i find mention RD-tree in more details is Data Mining Using SAS Enterprise Miner By Randall Matignon , which refer to RD-tree as improvement over the famous KD-tree algorithm and mention the differences between the two. But i find SAS refer to the RD-tree as the Patented Reduced Dimensionality Tree in the SAS Enterprise Miner Fact Sheet and i did find a lot about it.
If you are looking for case studies where MBR node were used i found such two papers:
Prediction of Rise in Sea Level by Memory-Based Reasoning Model using SAS® Enterprise Miner™ 12.1, where scan and RD-tree used and results compared to each other.
01-06-2015 10:33 AM
Mohamed made a bunch of good points. The EM help does describe generally how the RD-Tree works, while the Scan method is just and exhaustive search.
I do want to add that for the MBR Node, all of the inputs need to be interval. In addition, the k-nearest neighbors method (in general, not just the MBR Node) assumes that the variables are orthogonal and standardized, so we recommend using principal components prior to MBR unless you know that your variables are orthogonal and standardized. If you don't have orthogonal or standardized variables, then some variables will skew the k-nearest neighbors more than others (the node will still run, but your results may not be as useful as you would like).
I hope this helps.
06-08-2015 07:50 PM
Do you know the difference between MBR node and PROC DISCRIM procedure? I used both to run my KNN but I got totally different result, but I don't know which one to use for scoring my new data set.
Thanks for your help.
06-12-2015 02:34 PM
This is a very good question. To echo the comments of many above, the basic difference between PROC DISCRIM in SAS/STAT and the MBR node in SAS Enterprise Miner is that the MBR node can use the RDTREE method to search for nearest neighbor observations with a known class in training data to classify new observations in test data. (The RDTREE method is a proprietary version of the popular KDTREE algorithm.) The tree-based neighbor search can be faster, but you may get different results between PROC DISCRIM and the MBR node.
If you want to have the closest correspondence between PROC DISCRIM and the MBR node, use to the SCAN option in the MBR node's METHOD property to request that the MBR node use a conventional distance calculation to classify new observations. However, the traditional distance calculation may be unsuited for big data.
Here is an example of how different your results could be in a low-dimensional simulated data set. (Your results could be even more different with real data.)
Notice that the classifications made by PROC DISCRIM and PROC PMBR (i.e. the MBR node) using the SCAN method are almost identical. Using the RDTREE method with a small EPSILON, you can also closely replicate the results of PROC DISCRIM on this sample data. But, if you change the value for EPSILON then your results can be noticeably different from PROC DISCRIM using its default settings.
Changing the BUCKETS property does not appear to change the classification results, but changing the EPSILON property does change the classification results. So what is going on here?
The BUCKETS property:
The EPSILON property:
If you would like to try this experiment for yourself, the code is available here: PROC DISCRIM vs. the MBR node in Enterprise Miner
Message was edited by: Patrick Hall; updated code and added details for BUCKETS and EPSILON.
06-12-2015 03:55 PM
Thank you very much Patrick for sharing this information and specifically your code.
It really helps in explaining the difference between the output from PROC DISCRIM and MBR node.