BookmarkSubscribeRSS Feed
Obsidian | Level 7



In Logistics regression , we have a dishotomous dependent variable and continuous / nominal indeoendent variables. How do we assess the relationship between them for selecting the variables for the model?



Jade | Level 19

Hi Vishal,


Whenever I built logistic regression models for dichotomous outcomes, I first performed univariate analyses for each independent variable. Basically, you can use PROC LOGISTIC to fit these univariable models. If the resulting p-value of a predictor (corresponding to the null hypothesis that the regression coefficient is zero) was less than some threshold value (typically 0.25), it would be a candidate for inclusion in multiple logistic regression later. Of course, important predictors from a content point of view (previous knowledge) should not be excluded just on the grounds of this purely statistical criterion.


For categorical independent variables (with k levels) a 2xk contingency table analysis with PROC FREQ will provide additional insight beyond the p-value (of the likelihood ratio chi-square test). For example, you can easily spot empty cells and sparse categories, which you may want to consolidate with other categories prior to further analysis.


After this preselection you can employ the built-in effect selection methods of PROC LOGISTIC (see SELECTION= option of the MODEL statement), in order to get down from, e.g., 20 to 5 model variables. Forward, backward, stepwise and best subset selection are available. If more than, say, 40 effects passed the preselection criteria, you may have to narrow these down by building preliminary multiple logistic regression models with manually selected subsets of independent variables. The results can give you hints as to which predictors should be filtered out, for example because they are almost redundant due to strong dependencies among the predictors.


You can find more details on variable selection, e.g., in Chapter 4 of Hosmer/Lemeshow: Applied Logistic Regression.

@pearsoninst: I think Vishal's question is about logistic regression with a dichotomous dependent variable, not linear regression with a continuous dependent variable.

Obsidian | Level 7

Sorry Iam replying quite late.


I have a doubt here. In Linear regression we remove any collinearity between predictor by measuring VIF value of different predictors.


How is this step done in Logistics Regression. The Model that I am working on has WOE binned variables as preditors , so how the collinearity wil be checked. Is it that we can apply the same consept of VIF on underlying continuous variables (since collinearity  cannot be measured betweeen Nominal variables) or we find out the Chi-Square Statistics between WOE variables.



Jade | Level 19

This is a very specific question. Please open a new thread for it. Then it will find a much broader audience. I have never worked with Weight-of-Evidence (WOE) binned variables.

Super User

In Linear regression, SAS use Least Square Method to estimate the coefficient, so you can use VIF to check collinearity.

But in Logistic regression, SAS use Maximize Likelihood Method to estimate the coefficient. SAS will automatically check the 

linear correlation between two variables,once SAS found collinearity ,SAS will set the coefficient of one of them to be zero . 


Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 4 in conversation