Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- multivariate logistic regression: variable troubleshooting

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-07-2017 03:46 PM
(2522 views)

I am assessing for outcome "eventX" with survey data.

One variables, "diseaseX" has an association of p=0.023 on univariate chi square.

When placed in the multivariate regression model with multiple other variables, it has a lower p-value of 0.0003. SAS does not give any messages about correlation and the model has convergence.

If this is due to some kind association where one variable reenforces another (forgot what thats called), then how can I find which variable it is? Otherwise, how can I deal with this, is it ok to leave the variable in the model, if there is an association?

Please explain. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is a major drawback to having multiple independent variables which are correlated with one another. If you add (or subtract) a variable from the model, the estimated regression coefficient can change (sometimes dramatically) and the p-value can change (sometimes dramatically).

How can you deal with this? Well, fundamentally, I think that variable selection strategies are flawed in the case where the independent variables are highly correlated, and further, there is no logical way to determine the effect of variable x1 independently of the other variables.

So that leads me to a method that is different conceptually. It includes all variables, so there is no issue of variable selection. It does not try to determine the unique independent effect of each variable; it tries to determine a good predictive model. That method is Partial Least Squares Regression (PROC PLS in SAS). For the logistic case, you could either model 0/1 responses, or you could use this method: https://cedric.cnam.fr/fichiers/RC906.pdf

--

Paige Miller

Paige Miller

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is a major drawback to having multiple independent variables which are correlated with one another. If you add (or subtract) a variable from the model, the estimated regression coefficient can change (sometimes dramatically) and the p-value can change (sometimes dramatically).

How can you deal with this? Well, fundamentally, I think that variable selection strategies are flawed in the case where the independent variables are highly correlated, and further, there is no logical way to determine the effect of variable x1 independently of the other variables.

So that leads me to a method that is different conceptually. It includes all variables, so there is no issue of variable selection. It does not try to determine the unique independent effect of each variable; it tries to determine a good predictive model. That method is Partial Least Squares Regression (PROC PLS in SAS). For the logistic case, you could either model 0/1 responses, or you could use this method: https://cedric.cnam.fr/fichiers/RC906.pdf

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

How do I use this for survey data?

I need to account for stratums, clusters, and weights.

I am currently using Proc Surveylogistic.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As there is no WEIGHT statement in PROC PLS, it's not going to fit your problem, but SURVEYLOGISTIC doesn't really do a good job either in the case of many correlated x-variables.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I haven't seen the warning "WARNING: The information matrix is singular and thus the convergence is questionable" and I am not getting any errors in the log statement. However, there is some other possible association between variables.

I wonder if there is any way I can see whether some variables have whatever assocation there may be because I can find the problem variables and then remove them manually.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@sasnewbie12 wrote:

I haven't seen the warning "WARNING: The information matrix is singular and thus the convergence is questionable" and I am not getting any errors in the log statement. However, there is some other possible association between variables.

If the correlation between variables is not 1 or –1, then you will not get such a warning. If the correlation between two independent variables is (for example) 0.99, you will not get the warning, but you will get the problem you mentioned above that the model coefficients and the significance of the coefficients can change drastically when you add or remove variables from the model.

I wonder if there is any way I can see whether some variables have whatever assocation there may be because I can find the problem variables and then remove them manually.

Maybe the idea from @Reeza can be modified to allow PROC CORR to show you the correlations, but then the problem remains, which of the correlated variables do you remove? How would you choose? What if you get better predictions with leaving the correlated variables in the model? This is why I like the concept of Partial Least Squares, it has none of these difficulties, and it handles correlated predictor variables better than most other methods. It does not handle the stratums, clusters and weights, but doing some Googling (is that a real word?) I find this article: Optimized sample-weighted partial least squares, which apparently is a version of Partial Least Squares that would work with survey data (if I am understanding the abstract properly). Of course, there is no SAS code for this, a major drawback.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Reply above was updated apparently after @sasnewbie12 read it and clicked on Like. I added the results of my Google search for a PLS method that could be used with survey data.

--

Paige Miller

Paige Miller

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.