BookmarkSubscribeRSS Feed
juanvg1972
Pyrite | Level 9

Hi,

 

I am using proc logistic to make predictions.

 

I have found that 2 input vars are higly correlated (spearman=0.78).

This two vars are important in the model and the target var has dependency with both.

How can I manage teh correlation effect?

 

- Eliminate one of the model, the less important based on Wald chi-square

- Use both in the model and add an interaction effect between vars (var1*var2)

 

Are this good solutions?, any advice will be greatly apreciated

 

Thanks

4 REPLIES 4
PaigeMiller
Diamond | Level 26

@juanvg1972 wrote:

Hi,

 

I am using proc logistic to make predictions.

 

This two vars are important in the model and the target var has dependency with both.

How can I manage teh correlation effect?



I'm not sure this is a meaningful question. What do you mean by "manage"??

 

A better question would be — what is the best fitting model? You could fit a model with the interaction and see if it is significant, then you probably ought to leave the interaction in the model. You could also compare the models with a single predictor to the model with two predictors to see if adding the second term into the model to see if the fit improves noticeably.

--
Paige Miller
juanvg1972
Pyrite | Level 9

Sometimes I have heard that correlation betwenn input vars can cause problems in a model, then I ask anyway to deal with this problem.

PaigeMiller
Diamond | Level 26

Yes, correlation between the inputs can cause problems, specifically that the variability of your regression estimates, and variability of predicted values, can be inflated. Sometimes, the problem is so severe that your regression coefficients can have the wrong sign. If you have only two variables in the model, and the fit is good and the signs of the regression coefficients are in the right direction, then I think that's all the checking you have to do. You could also check the Variance Inflation Factor from PROC REG on your data, large numbers are indicative of problems.

 

However, you still have to be careful using such a model, if you have a new data point and you want to predict it's value, and the inputs are not in the same region as the data you used to create the model, then you are extrapolating and shouldn't trust the prediction.

 

 

--
Paige Miller
StatDave
SAS Super FREQ

Correlation between two variables is not necessarily a problem at all. As discussed in this note, problems with instability of the model occur when there is strong collinearity among the weighted predictors. As shown there, this can be checked using a weighted regression in PROC REG. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1359 views
  • 2 likes
  • 3 in conversation