Since the time of this original post (over 5 years!), SAS Enterprise Miner has added deep support for random forests, including an HP Forest node.
See Getting the most from your Random Forests in SAS Enterprise Miner. Also, watch this YouTube video about Random Forest and Support Vector Machines.
You might also want to read this paper about ensemble models in SAS Enterprise Miner. From the abstract:
Does the new version of Miner perform random forests?
"New procedures cover data binning, imputation, sampling, decisions, logistic and linear regressions, neural networks, random forests"
It has been a while since I inquired about this, but I found that gradient boosting was very useful! Thanks.
In EM 7.1, try PROC FOREST, which conducts random Forest in SAS EM. Unfortunately, SAS doesn't release the syntax or detailed documents. If you have EM7.1, you need to use the code generating function to peek into the secretes.
One advantage of random forest is that it is very easily to be parallelized by user. I can build a random forest of 2000 small trees by firing 4 sessions simultaneously, each building 500 small ones.
In the major release SAS had in August 2012, EM has a random forest node. Its latest version EM is 12.2 with HPFOREST node which essentially runs its PROC HPFOREST in its High Performance Analytics offerings.
I posted in my blog Analytics in Writing several use examples on HPFOREST node and PROC HPFOREST.
Random forest modeling typically requires a lot of memory. In large-scale predictive learning world there are people who invest in building in-memory models and modes of modeling, vs. others who invest in 'smart finesses' such as MapReduce. In in-meory modes of applications, for example, for the sake of building a random forest, often 1.5 TB RAM, distributed across parallel worker nodes, is not considered LARGE or MUCH.
I once saw SAS programmers writing SAS Base to build random forests. Over 10 years ago, I first saw Salford System's offering, which typically ran on smaller data sets. Naturally associated with complexity is big data set. This is where random forest is supposed to 'shine', but learning algorithms from papers is one thing. Industralizing it on large scale is entirely different game. I have used SAS HPFOREST capabilities for a while. I believe it is still generation ONE, but has crossed critical threshold into industralization.
I just came across this animated video on HPFOREST showing an example of how it may work in the academic space. While not getting into detail, it's a quick and artful watch.
Animating Analytics: PROC HPFOREST - YouTube
Anna
Hi Jason,
I found the articles on your website really helpful. Do you have any documentation relating to PROC HPFOREST which you can email me?
Please contact Tech Support (Technical Support Form) to get access to the secure HP procedure documentation that is available from the link:
Hi David,
Is there any reference about Gradient Boosting? Thanks
If you have access to EM product or product documentation, gradient boosting details are just under the Gradient Boosting node. The PROC version of GB is proc treeboost. I think the details of all the PROCS behind EM are now public at SAS support site, although the official policy remains 'as it is' meaning not supported by SAS technical support. I know the high performance version of GB is under construction. No info when it will be ready.
Jason Xin
Since the time of this original post (over 5 years!), SAS Enterprise Miner has added deep support for random forests, including an HP Forest node.
See Getting the most from your Random Forests in SAS Enterprise Miner. Also, watch this YouTube video about Random Forest and Support Vector Machines.
You might also want to read this paper about ensemble models in SAS Enterprise Miner. From the abstract:
Is there a way to set the HP Forest node to chose the best model based on prediction performance on validation data? I only see in-sample options, no out-of-sample evaluation methods.
Thanks,
Maria
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.