Solved
Contributor
Posts: 20

# Is there a limit to number of independent variables that can be used in PROC REG?

I am getting the below error while attempting to check for collinearity using PROC REG (as discussed here: Collinearity Diagnostics)

ERROR: Eigenvalues failed in collinear option.

I have 387 numeric independent variables in total; and when I only use a subset, I do not get that error.  Any advice on how to check for collinearity among all of those variables?

Furthermore, I will be modeling all variables that pass my collinearity check on a dependent variable with values 0 & 1 using PROC LOGISTIC; and I'm concerned I will have a similar problem if there is a variable limit for PROC REG.

Thanks,

-Lee

Accepted Solutions
Solution
‎04-09-2014 11:26 AM
Super User
Posts: 23,761

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

predictor - Maximum number of independent variables that can be entered into a multiple regression e...

All Replies
Solution
‎04-09-2014 11:26 AM
Super User
Posts: 23,761

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

predictor - Maximum number of independent variables that can be entered into a multiple regression e...

Contributor
Posts: 20

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

My current dataset is about 21K observations. However, I am debugging the code right now; I plan to run on a much larger dataset, maybe 2-2.5M observations.

Super User
Posts: 23,761

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

Look into proc varclus perhaps?

SAS/STAT(R) 9.2 User's Guide, Second Edition

Contributor
Posts: 20

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

Just to close out this thread, the short answer to my question is to use more observations.  When I ran PROC REG with my 2M+ dataset, I did not get the error and was able to identify/eliminate collinear variables.

Posts: 3,052

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

I guess that "works" if you have more observations that you can use, which isn't the case for everyone. It "works" only in the sense that SAS can now do the mathematical calculations and you don't get the error, it doesn't work in the sense that you get good estimates or predictions.

The problem with 387 prdictor variables remains. The problem is that these 387 predictor variables are still partially correlated with one another, possibly highly correlated with one another, and this causes regression to produce predictions and parameter estimates with very high mean square error, meaning that they are probably not good predictions and estimates. In which case, partial least squares provides better (lower mean square error, often dramatically lower mean square error) estimates and predictions.

--
Paige Miller
SAS Super FREQ
Posts: 4,242

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

400 variables shouldn't present a problem for SAS regression procedures. I can't imagine why the eigenvalue computation is failing.  I've never seen that error message before, but I don't think you should ignore it.  It is probably telling you something important about your variables.  I'd try to determine what variables are collinear. Maybe look into PROC CORR or PROC PRINCOMP?

Posts: 3,052

## Re: Is there a limit to number of independent variables that can be used in PROC REG?

While the "hard" limits have been discussed already, there are practical limits (or perhaps I should say drawbacks) about using 387 predictor variables in a model. Even if they do not show exact collinearity, they may show partial collinearity, in other words some of the X-variables are highly (but not perfectly) correlated with other X-variables. In that case, regression is a poor choice to use to fit the model and make predictions; partial least squares is a better method in the sense that it has been shown to produce model predictions and parameter estimates that have lower mean squared error than you would get if you used regression. Also, if you use partial least squares, the issue of exact collinearity goes away, PLS doesn't care if there are multiple variables showing exact collinearity.

--
Paige Miller
🔒 This topic is solved and locked.