I’m working with the HP forest node using an imbalanced training set where the ratio between non-events to events is 6:1. I’m using approximately 60 trees and want the training data for each tree to be balanced 50:50 non-event: event.
Do I need to use a sample node to adjust the training data beforehand? or does the random forest node automatically select a balanced sample for each iteration/bag?
I’ve currently two models set up, the first using the pre-sampling approach (throwing away a large proportion of the non-event observations) and the second feeding in the imbalanced training set to the HP forest node. The second approach is giving me the best ROC/Lift on my holdout sample, therefore I’m guessing the HP forest node is doing something smart under the hood.
I’ve taken a look at the limited documentation and this is not covered unfortunately.
Any help would be greatly appreciated.
Thanks Jason for such a comprehensive answer – it’s really much appreciated.
Just one additional follow on question if I may, I’ve built a model using HP forest and I’m now trying to evaluate the variable importance.
In the variable importance table (within the HP forest results) a number of different metrics are captured including “Number of Splitting Rules”, “Train: Gini Reduction”, “Train: Margin Reduction , “OOB: Gini Reduction” and “OOB: Margin Reduction”.
I’m trying to find some SAS documentation on how these are calculated, for “OOB: Margin Reduction” I’m getting some negative values which is a little concerning. Is there any SAS documentation available?
Many thanks in advance.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.