BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leeklammer
Fluorite | Level 6

I am getting the below error while attempting to check for collinearity using PROC REG (as discussed here: Collinearity Diagnostics)

ERROR: Eigenvalues failed in collinear option.

I have 387 numeric independent variables in total; and when I only use a subset, I do not get that error.  Any advice on how to check for collinearity among all of those variables?

Furthermore, I will be modeling all variables that pass my collinearity check on a dependent variable with values 0 & 1 using PROC LOGISTIC; and I'm concerned I will have a similar problem if there is a variable limit for PROC REG.

Thanks,

-Lee

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

predictor - Maximum number of independent variables that can be entered into a multiple regression e...

View solution in original post

7 REPLIES 7
Reeza
Super User

The variable limit isn't from proc reg it's from the concept of regression. If you have more unknowns than data points you can't solve the equation, for the parameter estimates.

How many observations does your data have?

predictor - Maximum number of independent variables that can be entered into a multiple regression e...

leeklammer
Fluorite | Level 6

My current dataset is about 21K observations. However, I am debugging the code right now; I plan to run on a much larger dataset, maybe 2-2.5M observations.

leeklammer
Fluorite | Level 6

Just to close out this thread, the short answer to my question is to use more observations.  When I ran PROC REG with my 2M+ dataset, I did not get the error and was able to identify/eliminate collinear variables.

PaigeMiller
Diamond | Level 26

I guess that "works" if you have more observations that you can use, which isn't the case for everyone. It "works" only in the sense that SAS can now do the mathematical calculations and you don't get the error, it doesn't work in the sense that you get good estimates or predictions.

The problem with 387 prdictor variables remains. The problem is that these 387 predictor variables are still partially correlated with one another, possibly highly correlated with one another, and this causes regression to produce predictions and parameter estimates with very high mean square error, meaning that they are probably not good predictions and estimates. In which case, partial least squares provides better (lower mean square error, often dramatically lower mean square error) estimates and predictions.

--
Paige Miller
Rick_SAS
SAS Super FREQ

400 variables shouldn't present a problem for SAS regression procedures. I can't imagine why the eigenvalue computation is failing.  I've never seen that error message before, but I don't think you should ignore it.  It is probably telling you something important about your variables.  I'd try to determine what variables are collinear. Maybe look into PROC CORR or PROC PRINCOMP?

PaigeMiller
Diamond | Level 26

While the "hard" limits have been discussed already, there are practical limits (or perhaps I should say drawbacks) about using 387 predictor variables in a model. Even if they do not show exact collinearity, they may show partial collinearity, in other words some of the X-variables are highly (but not perfectly) correlated with other X-variables. In that case, regression is a poor choice to use to fit the model and make predictions; partial least squares is a better method in the sense that it has been shown to produce model predictions and parameter estimates that have lower mean squared error than you would get if you used regression. Also, if you use partial least squares, the issue of exact collinearity goes away, PLS doesn't care if there are multiple variables showing exact collinearity.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 6307 views
  • 7 likes
  • 4 in conversation