BookmarkSubscribeRSS Feed
StanW1
Calcite | Level 5

Hello,  I'm runing the following steps across multiple independent iterations for testing purposes: (1) run HPForest against the same input dataset with the same parameters including the SEED value, (2) save the binary score code, and (3) score the same target dataset with that binary score code.  In this case the target variable is categorical.  I notice that predicted class probabilities are almost always different across runs, and that the final class prediction, based on the max class probability, can be different on occasion.  I would have expected the same result each time given that the input dataset, all parameters, and the random seed are fixed.  I'm running 9.04.01M3P062415 on WX64_SV.  Has anyone else seen this behavior?  Is this expected?  Thanks!   

2 REPLIES 2
BethEbersole
SAS Employee

Yes, I would expect different results.  The random forest is an ensemble model of many decision trees.  Each tree is built on a randomly selected subset of observations (rows) AND at each node, only a randomly selected subset of variables is available for splitting.

PadraicGNeville
SAS Employee

If the observations are being read in parallel, then the order of observations in memory is somewhat randomized: the fastest thread of the moment gets it's block of observations to be the first in HPFOREST memory.  The Out-Of-Bag sample is chosen by in-memory observation number.   Different Out-Of-Bag samples produce different trees.   To check this theory, you can set THREADS=1 on the PERFORMANCE statement.

 

When HPFOREST runs on a cluster of machines (MPP mode, that is), then reproducibility is foiled by the system that gives random machine numbers to HPFOREST.   HPFOREST includes the machine number in some random choices so as not to have different machines doing identical things.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1264 views
  • 1 like
  • 3 in conversation