turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Is there a limit to number of independent variable...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2014 11:23 AM

I am getting the below error while attempting to check for collinearity using PROC REG (as discussed here: Collinearity Diagnostics)

ERROR: Eigenvalues failed in collinear option.

I have 387 numeric independent variables in total; and when I only use a subset, I do not get that error. Any advice on how to check for collinearity among all of those variables?

Furthermore, I will be modeling all variables that pass my collinearity check on a dependent variable with values 0 & 1 using PROC LOGISTIC; and I'm concerned I will have a similar problem if there is a variable limit for PROC REG.

Thanks,

-Lee

Accepted Solutions

Solution

04-09-2014
11:26 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2014 11:26 AM

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

All Replies

Solution

04-09-2014
11:26 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2014 11:26 AM

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2014 11:47 AM

My current dataset is about 21K observations. However, I am debugging the code right now; I plan to run on a much larger dataset, maybe 2-2.5M observations.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2014 11:57 AM

Look into proc varclus perhaps?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-14-2014 10:25 AM

Just to close out this thread, the short answer to my question is to use more observations. When I ran PROC REG with my 2M+ dataset, I did not get the error and was able to identify/eliminate collinear variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-14-2014 11:16 AM

I guess that "works" if you have more observations that you can use, which isn't the case for everyone. It "works" only in the sense that SAS can now do the mathematical calculations and you don't get the error, it doesn't work in the sense that you get good estimates or predictions.

The problem with 387 prdictor variables remains. The problem is that these 387 predictor variables are still partially correlated with one another, possibly highly correlated with one another, and this causes regression to produce predictions and parameter estimates with very high mean square error, meaning that they are probably not good predictions and estimates. In which case, partial least squares provides better (lower mean square error, often dramatically lower mean square error) estimates and predictions.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2014 01:02 PM

400 variables shouldn't present a problem for SAS regression procedures. I can't imagine why the eigenvalue computation is failing. I've never seen that error message before, but I don't think you should ignore it. It is probably telling you something important about your variables. I'd try to determine what variables are collinear. Maybe look into PROC CORR or PROC PRINCOMP?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-09-2014 01:34 PM

While the "hard" limits have been discussed already, there are practical limits (or perhaps I should say drawbacks) about using 387 predictor variables in a model. Even if they do not show exact collinearity, they may show partial collinearity, in other words some of the X-variables are highly (but not perfectly) correlated with other X-variables. In that case, regression is a poor choice to use to fit the model and make predictions; partial least squares is a better method in the sense that it has been shown to produce model predictions and parameter estimates that have lower mean squared error than you would get if you used regression. Also, if you use partial least squares, the issue of exact collinearity goes away, PLS doesn't care if there are multiple variables showing exact collinearity.