BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pvareschi
Quartz | Level 8

Re: Applied Analytics Using SAS Enterprise Miner -> Lesson 3: Introduction to Predictive Modeling Using SAS Enterprise Miner -> Model Complexity

 

I am not sure I fully understand and appreciate the meaning and implications of the concepts of bias and variance as presented in the above lesson:

1. I understand that bias would occur with model underfitting, because, essentially, the model would not be flexible/complex enough to capture "the signal"; could bias occur with overfitting too?

2. What does "variance" refers to, when talking abot overfitted models? The lesson text reads: "[...] An overly complex model might be too flexible, which can lead to overfitting, that is, accommodating nuances of the random noise in the particular sample (high variance)"

Does "high variance" refers to the fact that an overfitted model would produce highly variable/erratic predictions/results on a new data set (i.e. would not generalise well on new data)?

1 ACCEPTED SOLUTION

Accepted Solutions
TheresaStemler
SAS Moderator

Hi pvareschi, 

Here's the reply from the instructor: 

  1. Bias would not occur with an overfit model.  Only an underfit model would “systematically” predict values either larger than or smaller than the true target value.  That’s what bias in this sense means.
  2. When we talk about a model being “high variance” we mean that it is modeling or capturing the unpredictable and unrepeatable random variation in the data.  All data has such random variation, but only when models are overfit do they try to capture this random variation.  A model that fits the data “just right” is only capturing the predictable and repeatable patterns in the data

Best, 

theresa

View solution in original post

1 REPLY 1
TheresaStemler
SAS Moderator

Hi pvareschi, 

Here's the reply from the instructor: 

  1. Bias would not occur with an overfit model.  Only an underfit model would “systematically” predict values either larger than or smaller than the true target value.  That’s what bias in this sense means.
  2. When we talk about a model being “high variance” we mean that it is modeling or capturing the unpredictable and unrepeatable random variation in the data.  All data has such random variation, but only when models are overfit do they try to capture this random variation.  A model that fits the data “just right” is only capturing the predictable and repeatable patterns in the data

Best, 

theresa