Solved: Effect of precision of parameter estimates on predictions

pvareschi · Posted 11-02-2020 01:06 PM

Re: Predictive Modeling Using Logistic Regression

When fitting a parametric model, such as Logistic Regression (or Linear Regression), what is the impact of obtaining parameter estimates with a high precision, in terms of developing a predictive model?

(1) In other words, is it correct to say that if the precision is low (i.e. wide standard errors of coefficients) then the model is expected to perform poorly on new data because the fitted response surface may not be close enough (representative) of the true relationship in the underlying population?

(2) When working with very large datasets, is it correct to say that precision becomes less of a concern because large samples always result in narrow standard errors?

(3) Lastly, is multicollinearity a problem because of its adverse effect on the precision of the parameter estimates?

PaigeMiller · Posted 11-04-2020 01:14 PM

In that scenario, it looks to me that predictions from that models would be inaccurate when applied to a new sample (assuming it was drawn/representative of the same underlying population).

I re-word this as:

In that scenario, it looks to me that predictions from that models would be ~~inaccurate~~ low precision when applied to a new sample (assuming it was drawn/representative of the same underlying population). Accuracy is not precision. Precision is not accuracy.

Just as the predictions from the original data might have high variance (low precision), the predictions from a new sample will have high variance (low precision). So if you have an observation in a new sample, and the true underlying value that you'd like to predict is 100, low precision might mean your confidence interval around the prediction is very wide, let's say from 10 to 190. Accuracy would be that over repeated samples, the mean of these predictions will be 100. High precision might have a confidence interval of 90 to 110, and in repeated samples, the mean of these predictions will be 100.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 11-02-2020 04:09 PM

@pvareschi wrote:

Re: Predictive Modeling Using Logistic Regression

When fitting a parametric model, such as Logistic Regression (or Linear Regression), what is the impact of obtaining parameter estimates with a high precision, in terms of developing a predictive model?

(1) In other words, is it correct to say that if the precision is low (i.e. wide standard errors of coefficients) then the model is expected to perform poorly on new data because the fitted response surface may not be close enough (representative) of the true relationship in the underlying population?

There's accuracy (or lack of bias), and there's precision (low standard errors). These are not the same. One does not imply the other. So, your question #1 combines the two and makes it sounds as if low precision also implies poor accuracy (in your words "may not be close enough"), but that's not the case. You can have accuracy with low precision. Some statistical methods trade off the two ... you can get higher precision if you accept some bias, and vice versa. There's another mathematical concept called Mean Square Error, which is the bias squared plus the variance; low mean square error is good, and this implies that you can't just look at precision by itself, and you can't just look at accuracy (or lack of bias) by itself.

(2) When working with very large datasets, is it correct to say that precision becomes less of a concern because large samples always result in narrow standard errors?

Any statistical method will have higher precision of the estimates with more data.

(3) Lastly, is multicollinearity a problem because of its adverse effect on the precision of the parameter estimates?

Yes, if for example, you have a variance inflation factor of 5 because of multicollinearity, then the estimate would have a 5 times larger standard error than if there was no multicollinearity. So you can spend time and effort trying to select variables such that there are none with high variance inflation factors, or you can use method like Partial Least Squares (PROC PLS in SAS) which are much less affected by multicollinearity, so the precision is high, but there is some bias, and in terms of mean squared error, a study showed that PLS generally produced great improvements of mean squared error compared to unbiased methods like linear regression.

--
Paige Miller

pvareschi · Posted 11-04-2020 01:05 PM

@PaigeMiller thank you for your explanations.

First of all, my apologies, I misspelled my question...in the first paragraph I meant "what is the impact of obtaining parameter estimates with a ~~high~~ low precision".

What I am struggling to get my head around is to link low precision to its effects on the estimates/predictions from a model, i.e. what does it mean in practice.

For example, for simplicity, if we consider a simple linear regression model (i.e. 1 predictor), a low precision means that, although the coefficient estimate may not be biased (assuming regression model is appropriate), the coefficient estimate obtained from the sample used may be well off from the true population parameters (e.g. b-hat=5 while beta=2).

In that scenario, it looks to me that predictions from that models would be inaccurate when applied to a new sample (assuming it was drawn/representative of the same underlying population).

Is that correct?

PaigeMiller · Posted 11-04-2020 01:14 PM

In that scenario, it looks to me that predictions from that models would be inaccurate when applied to a new sample (assuming it was drawn/representative of the same underlying population).

I re-word this as:

In that scenario, it looks to me that predictions from that models would be ~~inaccurate~~ low precision when applied to a new sample (assuming it was drawn/representative of the same underlying population). Accuracy is not precision. Precision is not accuracy.

Just as the predictions from the original data might have high variance (low precision), the predictions from a new sample will have high variance (low precision). So if you have an observation in a new sample, and the true underlying value that you'd like to predict is 100, low precision might mean your confidence interval around the prediction is very wide, let's say from 10 to 190. Accuracy would be that over repeated samples, the mean of these predictions will be 100. High precision might have a confidence interval of 90 to 110, and in repeated samples, the mean of these predictions will be 100.

--
Paige Miller

Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

Re: Effect of precision of parameter estimates on predictions

SAS Training: Just a Click Away