Is there a limit to number of independent variables that can be used in PROC REG?

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

Is there a limit to number of independent variables that can be used in PROC REG?

I am getting the below error while attempting to check for collinearity using PROC REG (as discussed here: Collinearity Diagnostics)

ERROR: Eigenvalues failed in collinear option.

I have 387 numeric independent variables in total; and when I only use a subset, I do not get that error.  Any advice on how to check for collinearity among all of those variables?

Furthermore, I will be modeling all variables that pass my collinearity check on a dependent variable with values 0 & 1 using PROC LOGISTIC; and I'm concerned I will have a similar problem if there is a variable limit for PROC REG.

Thanks,

-Lee


Accepted Solutions
Solution
‎04-09-2014 11:26 AM
Super User
Posts: 19,855

Re: Is there a limit to number of independent variables that can be used in PROC REG?

Posted in reply to leeklammer

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

predictor - Maximum number of independent variables that can be entered into a multiple regression e...

View solution in original post


All Replies
Solution
‎04-09-2014 11:26 AM
Super User
Posts: 19,855

Re: Is there a limit to number of independent variables that can be used in PROC REG?

Posted in reply to leeklammer

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

predictor - Maximum number of independent variables that can be entered into a multiple regression e...

Contributor
Posts: 20

Re: Is there a limit to number of independent variables that can be used in PROC REG?

My current dataset is about 21K observations. However, I am debugging the code right now; I plan to run on a much larger dataset, maybe 2-2.5M observations.

Super User
Posts: 19,855

Re: Is there a limit to number of independent variables that can be used in PROC REG?

Posted in reply to leeklammer

Look into proc varclus perhaps?

SAS/STAT(R) 9.2 User's Guide, Second Edition

Contributor
Posts: 20

Re: Is there a limit to number of independent variables that can be used in PROC REG?

Posted in reply to leeklammer

Just to close out this thread, the short answer to my question is to use more observations.  When I ran PROC REG with my 2M+ dataset, I did not get the error and was able to identify/eliminate collinear variables.

Trusted Advisor
Posts: 1,931

Re: Is there a limit to number of independent variables that can be used in PROC REG?

Posted in reply to leeklammer

I guess that "works" if you have more observations that you can use, which isn't the case for everyone. It "works" only in the sense that SAS can now do the mathematical calculations and you don't get the error, it doesn't work in the sense that you get good estimates or predictions.

The problem with 387 prdictor variables remains. The problem is that these 387 predictor variables are still partially correlated with one another, possibly highly correlated with one another, and this causes regression to produce predictions and parameter estimates with very high mean square error, meaning that they are probably not good predictions and estimates. In which case, partial least squares provides better (lower mean square error, often dramatically lower mean square error) estimates and predictions.

SAS Super FREQ
Posts: 3,755

Re: Is there a limit to number of independent variables that can be used in PROC REG?

Posted in reply to leeklammer

400 variables shouldn't present a problem for SAS regression procedures. I can't imagine why the eigenvalue computation is failing.  I've never seen that error message before, but I don't think you should ignore it.  It is probably telling you something important about your variables.  I'd try to determine what variables are collinear. Maybe look into PROC CORR or PROC PRINCOMP?

Trusted Advisor
Posts: 1,931

Re: Is there a limit to number of independent variables that can be used in PROC REG?

Posted in reply to leeklammer

While the "hard" limits have been discussed already, there are practical limits (or perhaps I should say drawbacks) about using 387 predictor variables in a model. Even if they do not show exact collinearity, they may show partial collinearity, in other words some of the X-variables are highly (but not perfectly) correlated with other X-variables. In that case, regression is a poor choice to use to fit the model and make predictions; partial least squares is a better method in the sense that it has been shown to produce model predictions and parameter estimates that have lower mean squared error than you would get if you used regression. Also, if you use partial least squares, the issue of exact collinearity goes away, PLS doesn't care if there are multiple variables showing exact collinearity.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 1898 views
  • 7 likes
  • 4 in conversation