Obsidian | Level 7

## Logistic Regression convergence

I have run a huge logistic regression with about 900 independant variables in the model. All variable sin the model, including the dependant are binary 1 or 0.

The log states that:

WARNING: The information matrix is singular and thus the convergence is questionable.

while also stating that:

NOTE: Convergence criterion (GCONV=1E-8) satisfied.

I am just using this model to identify potentially significant independant factors that predict the dependant outcome; I will then use those significant variables in further modeling with other covariates not included in this model.

Therefore, do I have to fix this issue by potentially removing variables (perhaps there is collinearity?), or can I rely on the model?

It would be difficult to try and pick and choose from 900 variables.

7 REPLIES 7
Diamond | Level 26

## Re: Logistic Regression convergence

sasnewbie12 wrote:

Therefore, do I have to fix this issue by potentially removing variables (perhaps there is collinearity?), or can I rely on the model?

It would be difficult to try and pick and choose from 900 variables.

Not "perhaps". There is collinearity. As in, one (or more) of the 900 variables is a perfect linear combination of the others.

I wouldn't do this. Even if you can trust the model (which you probably can't), logistic regression is a poor choice of technique when you have 900 correlated variables.

Better you should use a technique which is much less affected by the presence of collinearity. That method is Partial Least Squares regression, which in SAS is PROC PLS.

--
Paige Miller
Obsidian | Level 7

## Re: Logistic Regression convergence

This is also survey data. I don' t think there is any proc for PLS with survey data.

Diamond | Level 26

## Re: Logistic Regression convergence

That doesn't change any of my comments. Logistic regression in this case is a nightmare. The collinearity will make your results meaningless.

You could modify the data to weight things as the survey requires, and then run PROC PLS.

--
Paige Miller
Super User

## Re: Logistic Regression convergence

```Since you have a huge variables for logistic regression,
I suggest you use PROC HPGENSELECT to select the most significant dozen of variables.

```
Diamond | Level 26

## Re: Logistic Regression convergence

In my opinion, HPGENSELECT fails for the same reason as LOGISTIC, it is not meant to account for the collinearity of the 900 variables. Forward and stepwise methods are widely regarded by the statistical community as having major drawbacks.

--
Paige Miller
Super User

## Re: Logistic Regression convergence

```There are other selection method like LASSO, CV ..... in PROC HPGENSELECT .

```
Diamond | Level 26

## Re: Logistic Regression convergence

With regards to Lasso, there is this long thread in which many people think Lasso is not a good choice with large number of correlated variables. https://stats.stackexchange.com/questions/7935/what-are-disadvantages-of-using-the-lasso-for-variabl...

I don't know enough about CV to comment.

--
Paige Miller
Discussion stats
• 7 replies
• 3029 views
• 0 likes
• 3 in conversation