11-29-2015 06:25 AM
In Logistics regression , we have a dishotomous dependent variable and continuous / nominal indeoendent variables. How do we assess the relationship between them for selecting the variables for the model?
11-29-2015 09:14 AM - edited 11-29-2015 09:38 AM
You will mostly find everything there...
11-30-2015 04:47 AM
Whenever I built logistic regression models for dichotomous outcomes, I first performed univariate analyses for each independent variable. Basically, you can use PROC LOGISTIC to fit these univariable models. If the resulting p-value of a predictor (corresponding to the null hypothesis that the regression coefficient is zero) was less than some threshold value (typically 0.25), it would be a candidate for inclusion in multiple logistic regression later. Of course, important predictors from a content point of view (previous knowledge) should not be excluded just on the grounds of this purely statistical criterion.
For categorical independent variables (with k levels) a 2xk contingency table analysis with PROC FREQ will provide additional insight beyond the p-value (of the likelihood ratio chi-square test). For example, you can easily spot empty cells and sparse categories, which you may want to consolidate with other categories prior to further analysis.
After this preselection you can employ the built-in effect selection methods of PROC LOGISTIC (see SELECTION= option of the MODEL statement), in order to get down from, e.g., 20 to 5 model variables. Forward, backward, stepwise and best subset selection are available. If more than, say, 40 effects passed the preselection criteria, you may have to narrow these down by building preliminary multiple logistic regression models with manually selected subsets of independent variables. The results can give you hints as to which predictors should be filtered out, for example because they are almost redundant due to strong dependencies among the predictors.
You can find more details on variable selection, e.g., in Chapter 4 of Hosmer/Lemeshow: Applied Logistic Regression.
@pearsoninst: I think Vishal's question is about logistic regression with a dichotomous dependent variable, not linear regression with a continuous dependent variable.
05-15-2016 01:30 AM
Sorry Iam replying quite late.
I have a doubt here. In Linear regression we remove any collinearity between predictor by measuring VIF value of different predictors.
How is this step done in Logistics Regression. The Model that I am working on has WOE binned variables as preditors , so how the collinearity wil be checked. Is it that we can apply the same consept of VIF on underlying continuous variables (since collinearity cannot be measured betweeen Nominal variables) or we find out the Chi-Square Statistics between WOE variables.
05-15-2016 05:39 AM
This is a very specific question. Please open a new thread for it. Then it will find a much broader audience. I have never worked with Weight-of-Evidence (WOE) binned variables.
05-16-2016 03:54 AM
In Linear regression, SAS use Least Square Method to estimate the coefficient, so you can use VIF to check collinearity.
But in Logistic regression, SAS use Maximize Likelihood Method to estimate the coefficient. SAS will automatically check the
linear correlation between two variables,once SAS found collinearity ,SAS will set the coefficient of one of them to be zero .