turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- My p-value increases when I account for multicolli...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-04-2017 10:31 PM

I am running a multiple linear regression model and I have 8 covariates, 4 of them are highly correlated (r>0.7). So I created z scores and then created a composite. When I re-ran the model with this composite, my predictor p-value became significantly larger and my R2 went down. Why is this happening? I thought p-values decreased after accounting for multicollinearity?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-04-2017 11:36 PM

It's more than just the P-Value.

An excerpt from Wikipedia that's relevant here:

*So long as the underlying specification is correct, multicollinearity does not actually bias results; it just produces large standard errors in the related independent variables. More importantly, the usual use of regression is to take coefficients from the model and then apply them to other data. Since multicollinearity causes imprecise estimates of coefficient values, the resulting out-of-sample predictions will also be imprecise. And if the pattern of multicollinearity in the new data differs from that in the data that was fitted, such extrapolation may introduce large errors in the predictions.*

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 08:43 AM - edited 04-05-2017 09:01 AM

LucyB wrote:

I'm not 100% sure what you mean by "I created z scores and then created a composite", but whatever this means, it could be that the new variables you are using to account for the multi-collinearity are not as predictive of the response variable as the original variables are.

In any event, in the presence of multi-collinearity, I always recommend using Partial Least Square regression (PROC PLS) instead of ordinary least squares regression. Partial Least Squares generally is less affected by multicollinearity, and results in model coefficients that have less variability (lower mean squared error) and predicted values that have lower mean squared error than you would get using ordinary least squares. See http://amstat.tandfonline.com/doi/abs/10.1080/00401706.1993.10485033.

Also, I agree 100% with @Reeza's quote from Wikipedia.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 09:04 PM

Well from what I remember from a course, if you have multicollinearity among some covariates (these are questionairres), you can cover them to Z scores, and then average them to be 1 variable. is this not correct?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 09:11 PM

LucyB wrote:

No, what you're referring to is standardization which puts all variables on the same scale. It prevents variables that are bigger in size from being too influential in the model.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 09:54 PM

but doesnt it still address multicollinearity?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 10:32 PM

Not really, standardized variables can still be correlated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 11:11 PM

Yes- the standardization itself did not address the collinearity, but because of the standardization, a composite can be calculated, which will address the collinearity from my understanding.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 11:30 PM

When you say composite are you talking about a principal component or eigenvector? Then yes, the eigenvectors by definition are orthogonal and independent. But not all eigenvectors are used in the model which also helps

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 11:33 PM

Just because they're not correlated with each doesn't mean they'll correlate with the dependent variable either...your initial assumption that you'd get a 'better' model is based on comparison of the p-values but as we mentioned earlier it's not just p-values.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2017 09:44 AM

LucyB wrote:

Sure, there is less (or no) collinearity if you replace four correlated variables with one "composite" variable, but this is somewhat meaningless as you don't know how the original 4 variables can be used to predict the output. And as has been stated, this new "composite" variable may not be a good predictor.

Overall, I'd say this is not an approach I would recommend in this case.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2017 09:42 AM

Did you check Variance Inflation Factor ? proc reg model ......... / vif ;