Obsidian | Level 7

## Correlation between Dichotomous & Continuous / Nominal variabls: Proc Logistics

Hi,

In Logistics regression , we have a dishotomous dependent variable and continuous / nominal indeoendent variables. How do we assess the relationship between them for selecting the variables for the model?

Vishal

5 REPLIES 5
Pyrite | Level 9

## Re: Correlation between Dichotomous & Continuous / Nominal variabls: Proc Logistics

You will mostly find everything there...

http://www.ats.ucla.edu/stat/sas/webbooks/reg/chapter3/sasreg3.htm

## Re: Correlation between Dichotomous & Continuous / Nominal variabls: Proc Logistics

Hi Vishal,

Whenever I built logistic regression models for dichotomous outcomes, I first performed univariate analyses for each independent variable. Basically, you can use PROC LOGISTIC to fit these univariable models. If the resulting p-value of a predictor (corresponding to the null hypothesis that the regression coefficient is zero) was less than some threshold value (typically 0.25), it would be a candidate for inclusion in multiple logistic regression later. Of course, important predictors from a content point of view (previous knowledge) should not be excluded just on the grounds of this purely statistical criterion.

For categorical independent variables (with k levels) a 2xk contingency table analysis with PROC FREQ will provide additional insight beyond the p-value (of the likelihood ratio chi-square test). For example, you can easily spot empty cells and sparse categories, which you may want to consolidate with other categories prior to further analysis.

After this preselection you can employ the built-in effect selection methods of PROC LOGISTIC (see SELECTION= option of the MODEL statement), in order to get down from, e.g., 20 to 5 model variables. Forward, backward, stepwise and best subset selection are available. If more than, say, 40 effects passed the preselection criteria, you may have to narrow these down by building preliminary multiple logistic regression models with manually selected subsets of independent variables. The results can give you hints as to which predictors should be filtered out, for example because they are almost redundant due to strong dependencies among the predictors.

You can find more details on variable selection, e.g., in Chapter 4 of Hosmer/Lemeshow: Applied Logistic Regression.

@pearsoninst: I think Vishal's question is about logistic regression with a dichotomous dependent variable, not linear regression with a continuous dependent variable.

Obsidian | Level 7

## Re: Correlation between Dichotomous & Continuous / Nominal variabls: Proc Logistics

I have a doubt here. In Linear regression we remove any collinearity between predictor by measuring VIF value of different predictors.

How is this step done in Logistics Regression. The Model that I am working on has WOE binned variables as preditors , so how the collinearity wil be checked. Is it that we can apply the same consept of VIF on underlying continuous variables (since collinearity  cannot be measured betweeen Nominal variables) or we find out the Chi-Square Statistics between WOE variables.

Vishal

## Re: Correlation between Dichotomous & Continuous / Nominal variabls: Proc Logistics

This is a very specific question. Please open a new thread for it. Then it will find a much broader audience. I have never worked with Weight-of-Evidence (WOE) binned variables.

Super User

## Re: Correlation between Dichotomous & Continuous / Nominal variabls: Proc Logistics

In Linear regression, SAS use Least Square Method to estimate the coefficient, so you can use VIF to check collinearity.

But in Logistic regression, SAS use Maximize Likelihood Method to estimate the coefficient. SAS will automatically check the

linear correlation between two variables,once SAS found collinearity ,SAS will set the coefficient of one of them to be zero .

Discussion stats
• 5 replies
• 3427 views
• 0 likes
• 4 in conversation