Contributor
Posts: 39

# My p-value increases when I account for multicollinearity in my MLR?

I am running a multiple linear regression model and I have 8 covariates, 4 of them are highly correlated (r>0.7). So I created z scores and then created a composite. When I re-ran the model with this composite, my predictor p-value became significantly larger and my R2 went down. Why is this happening? I thought p-values decreased after accounting for multicollinearity?

Super User
Posts: 20,688

## Re: My p-value increases when I account for multicollinearity in my MLR?

It's more than just the P-Value.

An excerpt from Wikipedia that's relevant here:

So long as the underlying specification is correct, multicollinearity does not actually bias results; it just produces large standard errors in the related independent variables. More importantly, the usual use of regression is to take coefficients from the model and then apply them to other data. Since multicollinearity causes imprecise estimates of coefficient values, the resulting out-of-sample predictions will also be imprecise. And if the pattern of multicollinearity in the new data differs from that in the data that was fitted, such extrapolation may introduce large errors in the predictions.

Posts: 2,044

## Re: My p-value increases when I account for multicollinearity in my MLR?

[ Edited ]

LucyB wrote:

I am running a multiple linear regression model and I have 8 covariates, 4 of them are highly correlated (r>0.7). So I created z scores and then created a composite. When I re-ran the model with this composite, my predictor p-value became significantly larger and my R2 went down. Why is this happening? I thought p-values decreased after accounting for multicollinearity?

I'm not 100% sure what you mean by "I created z scores and then created a composite", but whatever this means, it could be that the new variables you are using to account for the multi-collinearity are not as predictive of the response variable as the original variables are.

In any event, in the presence of multi-collinearity, I always recommend using Partial Least Square regression (PROC PLS) instead of ordinary least squares regression. Partial Least Squares generally is less affected by multicollinearity, and results in model coefficients that have less variability (lower mean squared error) and predicted values that have lower mean squared error than you would get using ordinary least squares. See http://amstat.tandfonline.com/doi/abs/10.1080/00401706.1993.10485033.

Also, I agree 100% with @Reeza's quote from Wikipedia.

--
Paige Miller
Contributor
Posts: 39

## Re: My p-value increases when I account for multicollinearity in my MLR?

Well from what I remember from a course, if you have multicollinearity among some covariates (these are questionairres), you can cover them to Z scores, and then average them to be 1 variable. is this not correct?

Super User
Posts: 20,688

## Re: My p-value increases when I account for multicollinearity in my MLR?

LucyB wrote:

Well from what I remember from a course, if you have multicollinearity among some covariates (these are questionairres), you can cover them to Z scores, and then average them to be 1 variable. is this not correct?

No, what you're referring to is standardization which puts all variables on the same scale. It prevents variables that are bigger in size from being too influential in the model.

Contributor
Posts: 39

## Re: My p-value increases when I account for multicollinearity in my MLR?

but doesnt it still address multicollinearity?

Super User
Posts: 20,688

Contributor
Posts: 39

## Re: My p-value increases when I account for multicollinearity in my MLR?

Yes- the standardization itself did not address the collinearity, but because of the standardization, a composite can be calculated, which will address the collinearity from my understanding.

Super User
Posts: 20,688

## Re: My p-value increases when I account for multicollinearity in my MLR?

When you say composite are you talking about a principal component or eigenvector? Then yes, the eigenvectors by definition are orthogonal and independent. But not all eigenvectors are used in the model which also helps

Super User
Posts: 20,688

## Re: My p-value increases when I account for multicollinearity in my MLR?

Just because they're not correlated with each doesn't mean they'll correlate with the dependent variable either...your initial assumption that you'd get a 'better' model is based on comparison of the p-values but as we mentioned earlier it's not just p-values.

Posts: 2,044

## Re: My p-value increases when I account for multicollinearity in my MLR?

LucyB wrote:

Yes- the standardization itself did not address the collinearity, but because of the standardization, a composite can be calculated, which will address the collinearity from my understanding.

Sure, there is less (or no) collinearity if you replace four correlated variables with one "composite" variable, but this is somewhat meaningless as you don't know how the original 4 variables can be used to predict the output. And as has been stated, this new "composite" variable may not be a good predictor.

Overall, I'd say this is not an approach I would recommend in this case.

--
Paige Miller
Super User
Posts: 10,197

## Re: My p-value increases when I account for multicollinearity in my MLR?

```Did you check Variance Inflation Factor ?

proc reg
model ......... / vif ;

```
Discussion stats
• 11 replies
• 188 views
• 2 likes
• 4 in conversation