BookmarkSubscribeRSS Feed
vishal_prof_gmail_com
Obsidian | Level 7

Hi,

 

In Logistics regression , we have a dishotomous dependent variable and continuous / nominal indeoendent variables. How do we assess the relationship between them for selecting the variables for the model?

 

Vishal

5 REPLIES 5
FreelanceReinh
Jade | Level 19

Hi Vishal,

 

Whenever I built logistic regression models for dichotomous outcomes, I first performed univariate analyses for each independent variable. Basically, you can use PROC LOGISTIC to fit these univariable models. If the resulting p-value of a predictor (corresponding to the null hypothesis that the regression coefficient is zero) was less than some threshold value (typically 0.25), it would be a candidate for inclusion in multiple logistic regression later. Of course, important predictors from a content point of view (previous knowledge) should not be excluded just on the grounds of this purely statistical criterion.

 

For categorical independent variables (with k levels) a 2xk contingency table analysis with PROC FREQ will provide additional insight beyond the p-value (of the likelihood ratio chi-square test). For example, you can easily spot empty cells and sparse categories, which you may want to consolidate with other categories prior to further analysis.

 

After this preselection you can employ the built-in effect selection methods of PROC LOGISTIC (see SELECTION= option of the MODEL statement), in order to get down from, e.g., 20 to 5 model variables. Forward, backward, stepwise and best subset selection are available. If more than, say, 40 effects passed the preselection criteria, you may have to narrow these down by building preliminary multiple logistic regression models with manually selected subsets of independent variables. The results can give you hints as to which predictors should be filtered out, for example because they are almost redundant due to strong dependencies among the predictors.

 

You can find more details on variable selection, e.g., in Chapter 4 of Hosmer/Lemeshow: Applied Logistic Regression.


@pearsoninst: I think Vishal's question is about logistic regression with a dichotomous dependent variable, not linear regression with a continuous dependent variable.

vishal_prof_gmail_com
Obsidian | Level 7

Sorry Iam replying quite late.

 

I have a doubt here. In Linear regression we remove any collinearity between predictor by measuring VIF value of different predictors.

 

How is this step done in Logistics Regression. The Model that I am working on has WOE binned variables as preditors , so how the collinearity wil be checked. Is it that we can apply the same consept of VIF on underlying continuous variables (since collinearity  cannot be measured betweeen Nominal variables) or we find out the Chi-Square Statistics between WOE variables.

 

Vishal

FreelanceReinh
Jade | Level 19

This is a very specific question. Please open a new thread for it. Then it will find a much broader audience. I have never worked with Weight-of-Evidence (WOE) binned variables.

Ksharp
Super User

In Linear regression, SAS use Least Square Method to estimate the coefficient, so you can use VIF to check collinearity.

But in Logistic regression, SAS use Maximize Likelihood Method to estimate the coefficient. SAS will automatically check the 

linear correlation between two variables,once SAS found collinearity ,SAS will set the coefficient of one of them to be zero . 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3720 views
  • 0 likes
  • 4 in conversation