Hello,
Yes,
I would say that observing (very) small differences is not unexpected with the SAS® Enterprise Miner™ High-Performance Procedures (like HPFOREST, HPCLUS, ...), even if the same seed is used.
The reason for the difference is the random variation that is associated with multi-threading.
You can get 100% reproducible results by disabling multi-threading, by specifying
performance nthreads=1;
[ NOTE: The SAS system options THREADS | NOTHREADS apply to the client machine on which the
SAS high-performance analytical procedures execute. They do not apply to the compute nodes in a
distributed environment. ]
If you prefer to have repeatability | reproducibility over performance, then try NTHREADS=1 until you encounter a situation in which doing so is not a practical solution. At that time, you can remove the NTHREADS=1 specification and take advantage of multi-threading.
I have no access to Enterprise Miner anymore (using VIYA Model Studio now), so I do not know about the equivalent for
performance nthreads=1;
in Enterprise Miner properties banner.
Anyway, k-means (HPCLUS algorithm) is a very special case. If you shuffle the observations (i.e. change the order), you will also get different results. But that's inherent to the k-means algorithm and how initial seeds are chosen.
Koen
... View more