BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SlutskyFan
Obsidian | Level 7
Via the settings in the decision tree node, is it possible to mimic random forests? I've read SAS help, Applied Analytics Using SAS Enterprise Miner, and done a google search, but I'm not getting very far.


( I think I can do k-fold cross validation using the cross validation settings for the decision tree, but I'm not sure I'm doing it correctly)

Any suggestions or references? I feel like I've got a good basic feel for enterprise miner, and a decent theoretical background in various machine learning techniques (I've read a lot of Elements of Statistical Learning: Data Mining, Inference and Prediction (http://www-stat.stanford.edu/~tibs/ElemStatLearn/) and wathced Andrew Ng's Machine learning lectures (http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599).

I need some more advanced references using Enterprise Miner.

Thanks.
1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

Since the time of this original post (over 5 years!), SAS Enterprise Miner has added deep support for random forests, including an HP Forest node. 

 

See Getting the most from your Random Forests in SAS Enterprise Miner.  Also, watch this YouTube video about Random Forest and Support Vector Machines.

 

You might also want to read this paper about ensemble models in SAS Enterprise Miner.  From the abstract: 

  • Ensemble models combine two or more models to enable a more robust prediction, classification, or variable selection. This paper describes three types of ensemble models: boosting, bagging, and model averaging. It discusses go-to methods, such as gradient boosting and random forest, and newer methods, such as rotational forest and fuzzy clustering.

 

 

 

 

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

View solution in original post

11 REPLIES 11
Ajay
Fluorite | Level 6

Does the new version of Miner perform random forests?

http://support.sas.com/documentation/cdl/en/whatsnew/64209/HTML/default/viewer.htm#emwhatsnew71m1.ht...

"New procedures cover data binning, imputation, sampling, decisions, logistic and linear regressions, neural networks, random forests"

SlutskyFan
Obsidian | Level 7

It has been a while since I inquired about this, but I found that gradient boosting was very useful! Thanks.

oloolo
Fluorite | Level 6

In EM 7.1, try PROC FOREST, which conducts random Forest in SAS EM. Unfortunately, SAS doesn't release the syntax or detailed documents. If you have EM7.1, you need to use the code generating function to peek into the secretes.

One advantage of random forest is that it is very easily to be parallelized by user. I can build a random forest of 2000 small trees by firing 4 sessions simultaneously, each building 500 small ones.

JasonXin
SAS Employee

In the major release SAS had in August 2012, EM has a random forest node. Its latest version EM is 12.2 with HPFOREST node which essentially runs its PROC HPFOREST in its High Performance Analytics offerings.

I posted in my blog Analytics in Writing several use examples on HPFOREST node and PROC HPFOREST.

Random forest modeling typically requires a lot of memory. In large-scale predictive learning world there are people who invest in building in-memory models and modes of modeling, vs. others who invest in 'smart finesses' such as MapReduce. In in-meory modes of applications, for example, for the sake of building a random forest, often 1.5 TB RAM, distributed across parallel worker nodes, is not considered LARGE or MUCH.

I once saw SAS programmers writing SAS Base to build random forests. Over 10 years ago, I first saw Salford System's offering, which typically ran on smaller data sets. Naturally associated with complexity is big data set. This is where random forest is supposed to 'shine', but learning algorithms from papers is one thing. Industralizing it on large scale is entirely different game. I have used SAS HPFOREST capabilities for a while. I believe it is still generation ONE, but has crossed critical threshold into industralization.

AnnaBrown
Community Manager

I just came across this animated video on HPFOREST showing an example of how it may work in the academic space. While not getting into detail, it's a quick and artful watch.

Animating Analytics: PROC HPFOREST - YouTube

Anna


Join us for SAS Community Trivia
SAS Bowl XXIX, The SAS Hackathon
Wednesday, March 8, 2023, at 10 AM ET | #SASBowl

AarushIssar
Calcite | Level 5

Hi Jason,

I found the articles on your website really helpful. Do you have any documentation relating to PROC HPFOREST which you can email me?

WendyCzika
SAS Employee

Please contact Tech Support (Technical Support Form) to get access to the secure HP procedure documentation that is available from the link:

http://support.sas.com/documentation/onlinedoc/miner/

TomiKong
Fluorite | Level 6

Hi David,

Is there any reference about Gradient Boosting? Thanks

JasonXin
SAS Employee

If you have access to EM product or product documentation, gradient boosting details are just under the Gradient Boosting node. The PROC version of GB is proc treeboost. I think the details of all the PROCS behind EM are now public at SAS support site, although the official policy remains 'as it is' meaning not supported by SAS technical support. I know the high performance version of GB is under construction. No info when it will be ready.

Jason Xin

ChrisHemedinger
Community Manager

Since the time of this original post (over 5 years!), SAS Enterprise Miner has added deep support for random forests, including an HP Forest node. 

 

See Getting the most from your Random Forests in SAS Enterprise Miner.  Also, watch this YouTube video about Random Forest and Support Vector Machines.

 

You might also want to read this paper about ensemble models in SAS Enterprise Miner.  From the abstract: 

  • Ensemble models combine two or more models to enable a more robust prediction, classification, or variable selection. This paper describes three types of ensemble models: boosting, bagging, and model averaging. It discusses go-to methods, such as gradient boosting and random forest, and newer methods, such as rotational forest and fuzzy clustering.

 

 

 

 

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
weeseml
Calcite | Level 5

Is there a way to set the HP Forest node to chose the best model based on prediction performance on validation data?  I only see in-sample options, no out-of-sample evaluation methods.

 

Thanks,

Maria

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 28618 views
  • 5 likes
  • 10 in conversation