BookmarkSubscribeRSS Feed

Why do machine learning models in SAS Viya for Learners sometimes yield different answers?

Started ‎03-17-2023 by
Modified ‎07-26-2023 by
Views 1,562

This is a common question that we get from academics – who were trained on modeling procedures such as Ordinary Least Squares (OLS) that yield the same answer every time the code is submitted.  A reframe of the question is: I submit a machine learning model (like a gradient boosting model) and get one answer.  I then submit it a second time – and get a slightly different answer.  What gives?

 

The simple answer: machine learning models are special.  Very special.

 

But that’s not a sufficient response.  So, let’s start with a slide from a SAS training course and then break it down a bit:

 

LGroves_0-1679065701790.png

 

With machine learning and predictive modeling, we’re typically moving away from inferential statistics, which often provide deterministic results (i.e., b2 = 0.22… and we should expect that every time we run the model). When departing from the traditional tools of inferential statistics, we need to adjust our mindset.  Machine learning models often produce nondeterministic results – i.e., which means that results should change a bit each time we press the submit button.  With non-deterministic results, we essentially think about “converging” or “estimating” a model, rather than computing an exact value.  Yes, machine learning and predictive models with their nondeterministic results are very much in the Bayesian spirit of things.

 

Being a bit less abstract, there are a few reasons why we’ll get marginally different results each time we run a machine learning algorithm:

 

  • The underlying training, validation, and testing samples change each time
    • This is, of course, unless we fix the samples with a seed
  • The algorithm is purposely adding randomness into the equation
    • A great example of this are gradient boosting models.  The randomness is how it improves… and is how the machine actually learns. (See what I did there?)
  • When data are distributed across several CPUs, this can also lead to slightly different results
    • Why?  Because while the mean of the mean is the global mean, the mode of the mode is not always the global mode. Ponder that for a bit.

 

So – to both academics and industry hackers alike – I say to you: don’t fret if your models are changing slightly.  The goal of machine learning and predictive models isn’t to have precise estimates on the right-hand side (i.e., explanatory/independent) variables.  Instead, the objective is to best predict future events.  And several models could do that equally well.

Version history
Last update:
‎07-26-2023 04:21 PM
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags