My p-value increases when I account for multicollinearity in my MLR?

LucyB · Posted 04-04-2017 10:31 PM

I am running a multiple linear regression model and I have 8 covariates, 4 of them are highly correlated (r>0.7). So I created z scores and then created a composite. When I re-ran the model with this composite, my predictor p-value became significantly larger and my R2 went down. Why is this happening? I thought p-values decreased after accounting for multicollinearity?

Reeza · Posted 04-04-2017 11:36 PM

It's more than just the P-Value.

An excerpt from Wikipedia that's relevant here:

So long as the underlying specification is correct, multicollinearity does not actually bias results; it just produces large standard errors in the related independent variables. More importantly, the usual use of regression is to take coefficients from the model and then apply them to other data. Since multicollinearity causes imprecise estimates of coefficient values, the resulting out-of-sample predictions will also be imprecise. And if the pattern of multicollinearity in the new data differs from that in the data that was fitted, such extrapolation may introduce large errors in the predictions.

PaigeMiller · Posted 04-05-2017 08:43 AM

@LucyB wrote:

I am running a multiple linear regression model and I have 8 covariates, 4 of them are highly correlated (r>0.7). So I created z scores and then created a composite. When I re-ran the model with this composite, my predictor p-value became significantly larger and my R2 went down. Why is this happening? I thought p-values decreased after accounting for multicollinearity?

I'm not 100% sure what you mean by "I created z scores and then created a composite", but whatever this means, it could be that the new variables you are using to account for the multi-collinearity are not as predictive of the response variable as the original variables are.

In any event, in the presence of multi-collinearity, I always recommend using Partial Least Square regression (PROC PLS) instead of ordinary least squares regression. Partial Least Squares generally is less affected by multicollinearity, and results in model coefficients that have less variability (lower mean squared error) and predicted values that have lower mean squared error than you would get using ordinary least squares. See http://amstat.tandfonline.com/doi/abs/10.1080/00401706.1993.10485033.

Also, I agree 100% with @Reeza's quote from Wikipedia.

--
Paige Miller

LucyB · Posted 04-05-2017 09:04 PM

Well from what I remember from a course, if you have multicollinearity among some covariates (these are questionairres), you can cover them to Z scores, and then average them to be 1 variable. is this not correct?

Reeza · Posted 04-05-2017 09:11 PM

@LucyB wrote:

Well from what I remember from a course, if you have multicollinearity among some covariates (these are questionairres), you can cover them to Z scores, and then average them to be 1 variable. is this not correct?

No, what you're referring to is standardization which puts all variables on the same scale. It prevents variables that are bigger in size from being too influential in the model.

LucyB · Posted 04-05-2017 09:54 PM

but doesnt it still address multicollinearity?

Reeza · Posted 04-05-2017 10:32 PM

Not really, standardized variables can still be correlated.

http://stats.stackexchange.com/questions/16710/does-standardising-independent-variables-reduce-colli...

LucyB · Posted 04-05-2017 11:11 PM

Yes- the standardization itself did not address the collinearity, but because of the standardization, a composite can be calculated, which will address the collinearity from my understanding.

Reeza · Posted 04-05-2017 11:30 PM

When you say composite are you talking about a principal component or eigenvector? Then yes, the eigenvectors by definition are orthogonal and independent. But not all eigenvectors are used in the model which also helps

Reeza · Posted 04-05-2017 11:33 PM

Just because they're not correlated with each doesn't mean they'll correlate with the dependent variable either...your initial assumption that you'd get a 'better' model is based on comparison of the p-values but as we mentioned earlier it's not just p-values.

PaigeMiller · Posted 04-06-2017 09:44 AM

@LucyB wrote:

Yes- the standardization itself did not address the collinearity, but because of the standardization, a composite can be calculated, which will address the collinearity from my understanding.

Sure, there is less (or no) collinearity if you replace four correlated variables with one "composite" variable, but this is somewhat meaningless as you don't know how the original 4 variables can be used to predict the output. And as has been stated, this new "composite" variable may not be a good predictor.

Overall, I'd say this is not an approach I would recommend in this case.

--
Paige Miller

Ksharp · Posted 04-05-2017 09:42 AM

Did you check Variance Inflation Factor ?

proc reg
model ......... / vif ;

My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

Re: My p-value increases when I account for multicollinearity in my MLR?

SAS Innovate 2025: Save the Date