BookmarkSubscribeRSS Feed
StanW1
Calcite | Level 5

Hello,  I'm runing the following steps across multiple independent iterations for testing purposes: (1) run HPForest against the same input dataset with the same parameters including the SEED value, (2) save the binary score code, and (3) score the same target dataset with that binary score code.  In this case the target variable is categorical.  I notice that predicted class probabilities are almost always different across runs, and that the final class prediction, based on the max class probability, can be different on occasion.  I would have expected the same result each time given that the input dataset, all parameters, and the random seed are fixed.  I'm running 9.04.01M3P062415 on WX64_SV.  Has anyone else seen this behavior?  Is this expected?  Thanks!   

2 REPLIES 2
BethEbersole
SAS Employee

Yes, I would expect different results.  The random forest is an ensemble model of many decision trees.  Each tree is built on a randomly selected subset of observations (rows) AND at each node, only a randomly selected subset of variables is available for splitting.

PadraicGNeville
SAS Employee

If the observations are being read in parallel, then the order of observations in memory is somewhat randomized: the fastest thread of the moment gets it's block of observations to be the first in HPFOREST memory.  The Out-Of-Bag sample is chosen by in-memory observation number.   Different Out-Of-Bag samples produce different trees.   To check this theory, you can set THREADS=1 on the PERFORMANCE statement.

 

When HPFOREST runs on a cluster of machines (MPP mode, that is), then reproducibility is foiled by the system that gives random machine numbers to HPFOREST.   HPFOREST includes the machine number in some random choices so as not to have different machines doing identical things.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1947 views
  • 1 like
  • 3 in conversation