BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pvareschi
Quartz | Level 8

Re: Predictive Modeling Using Logistic Regression

When fitting a parametric model, such as Logistic Regression (or Linear Regression), what is the impact of obtaining parameter estimates with a high precision, in terms of developing a predictive model?

(1) In other words, is it correct to say that if the precision is low (i.e. wide standard errors of coefficients) then the model is expected to perform poorly on new data because the fitted response surface may not be close enough (representative) of the true relationship in the underlying population?

(2) When working with very large datasets, is it correct to say that precision becomes less of a concern because large samples always result in narrow standard errors?

(3) Lastly, is multicollinearity a problem because of its adverse effect on the precision of the parameter estimates?

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

In that scenario, it looks to me that predictions from that models would be inaccurate when applied to a new sample (assuming it was drawn/representative of the same underlying population).

I re-word this as:

 

In that scenario, it looks to me that predictions from that models would be inaccurate low precision when applied to a new sample (assuming it was drawn/representative of the same underlying population). Accuracy is not precision. Precision is not accuracy.

 

Just as the predictions from the original data might have high variance (low precision), the predictions from a new sample will have high variance (low precision). So if you have an observation in a new sample, and the true underlying value that you'd like to predict is 100, low precision might mean your confidence interval around the prediction is very wide, let's say from 10 to 190. Accuracy would be that over repeated samples, the mean of these predictions will be 100. High precision might have a confidence interval of 90 to 110, and in repeated samples, the mean of these predictions will be 100.

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

@pvareschi wrote:

Re: Predictive Modeling Using Logistic Regression

When fitting a parametric model, such as Logistic Regression (or Linear Regression), what is the impact of obtaining parameter estimates with a high precision, in terms of developing a predictive model?

(1) In other words, is it correct to say that if the precision is low (i.e. wide standard errors of coefficients) then the model is expected to perform poorly on new data because the fitted response surface may not be close enough (representative) of the true relationship in the underlying population?

There's accuracy (or lack of bias), and there's precision (low standard errors). These are not the same. One does not imply the other. So, your question #1 combines the two and makes it sounds as if low precision also implies poor accuracy (in your words "may not be close enough"), but that's not the case. You can have accuracy with low precision. Some statistical methods trade off the two ... you can get higher precision if you accept some bias, and vice versa. There's another mathematical concept called Mean Square Error, which is the bias squared plus the variance; low mean square error is good, and this implies that you can't just look at precision by itself, and you can't just look at accuracy (or lack of bias) by itself.

 

(2) When working with very large datasets, is it correct to say that precision becomes less of a concern because large samples always result in narrow standard errors?

Any statistical method will have higher precision of the estimates with more data.

 

(3) Lastly, is multicollinearity a problem because of its adverse effect on the precision of the parameter estimates?

 

Yes, if for example, you have a variance inflation factor of 5 because of multicollinearity, then the estimate would have a 5 times larger standard error than if there was no multicollinearity. So you can spend time and effort trying to select variables such that there are none with high variance inflation factors, or you can use method like Partial Least Squares (PROC PLS in SAS) which are much less affected by multicollinearity, so the precision is high, but there is some bias, and in terms of mean squared error, a study showed that PLS generally produced great improvements of mean squared error compared to unbiased methods like linear regression.

--
Paige Miller
pvareschi
Quartz | Level 8

@PaigeMiller thank you for your explanations.

First of all, my apologies, I misspelled my question...in the first paragraph I meant "what is the impact of obtaining parameter estimates with a high low precision".

What I am struggling to get my head around is to link low precision to its effects on the estimates/predictions from a model, i.e. what does it mean in practice.

For example, for simplicity, if we consider a simple linear regression model (i.e. 1 predictor), a low precision  means that, although the coefficient estimate may not be biased (assuming regression model is appropriate), the coefficient estimate obtained from the sample used may be well off from the true population parameters (e.g. b-hat=5 while beta=2).

In that scenario, it looks to me that predictions from that models would be inaccurate when applied to a new sample (assuming it was drawn/representative of the same underlying population).

Is that correct?

PaigeMiller
Diamond | Level 26

In that scenario, it looks to me that predictions from that models would be inaccurate when applied to a new sample (assuming it was drawn/representative of the same underlying population).

I re-word this as:

 

In that scenario, it looks to me that predictions from that models would be inaccurate low precision when applied to a new sample (assuming it was drawn/representative of the same underlying population). Accuracy is not precision. Precision is not accuracy.

 

Just as the predictions from the original data might have high variance (low precision), the predictions from a new sample will have high variance (low precision). So if you have an observation in a new sample, and the true underlying value that you'd like to predict is 100, low precision might mean your confidence interval around the prediction is very wide, let's say from 10 to 190. Accuracy would be that over repeated samples, the mean of these predictions will be 100. High precision might have a confidence interval of 90 to 110, and in repeated samples, the mean of these predictions will be 100.

--
Paige Miller

 

This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:

Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 981 views
  • 0 likes
  • 2 in conversation