turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Missing values in logistic regression

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2013 10:14 AM

Hello,

I am trying to run a logistic regression and several of my observations were automatically deleted because they had missing values for the explanatory variables. Is there a way to avoid this?

Thanks!

Accepted Solutions

Solution

01-24-2013
11:16 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2013 11:16 AM

To avoid loosing cases when independent variables are missing you can try creating categorical variables and add missing category for that variable.

For example, if you have 200 cases and 20 are missing for a variable with 2 levels A (n=100) and B (n=80), you can create a new variable with levels A (n=100), B (n=20), and Missing (n=20). This way you do not need to impute (prone to bias), and you make full use of your sample. Make sure that you do not set as the reference category the "Missing" level. This way, you also adjust for missingness in that particular variable (missing values might be non-random).

For missing values in the dependent....there's nothing easy to do in my opinion (I once used a sort of propensity score estimating the likelihood of being missing in the dependent variables for each case and then used it as a covariate in my logistic regression).

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2013 10:39 AM

What would you expect SAS to do with the missing values?

You could impute them but that has issues of its own.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2013 11:04 AM

I'd like SAS to ignore the missing values instead of deleting a participant for having a missing value for a predictive variable.

I've lost about 20% of my participants in the analysis because they have missing values for some (non-outcome) variable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2013 11:13 AM

Hi.

You could run

**proc mi data = your_data;**

** var your_list_of_variables;**

**Ods Output MissPattern = miss_pattern;**

**run;**

to get the pattern of missingness in your data. So then, you could run your model without the variable that has lots of missing (if it's not the most important predictor)

Or you could impute.

Solution

01-24-2013
11:16 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2013 11:16 AM

To avoid loosing cases when independent variables are missing you can try creating categorical variables and add missing category for that variable.

For example, if you have 200 cases and 20 are missing for a variable with 2 levels A (n=100) and B (n=80), you can create a new variable with levels A (n=100), B (n=20), and Missing (n=20). This way you do not need to impute (prone to bias), and you make full use of your sample. Make sure that you do not set as the reference category the "Missing" level. This way, you also adjust for missingness in that particular variable (missing values might be non-random).

For missing values in the dependent....there's nothing easy to do in my opinion (I once used a sort of propensity score estimating the likelihood of being missing in the dependent variables for each case and then used it as a covariate in my logistic regression).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-01-2016 08:40 AM - edited 09-01-2016 08:41 AM

Would you mind explaining the new variable with the levels A(n=100), B(n=20?? or 80??), and missing(n=20)? Do I replace the new variable for the original variable or I use both of them in my regression? Also, if I'm doing multinomial logistic regression, can I use this method for more than more variables? Thanks many in advance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-01-2016 08:55 AM

@Lneri's suggestion is equivalent to creating a "missing level" for a categorical variable. You don't need to create the dummy variables manually as Lneri suggests. You can use the MISSING option on the CLASS statement in PROC LOGISTIC. This treats the missing values in classification variables as valid values.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2013 10:48 AM

Yes. But it's not simple. It's called multiple imputation. It involves replacing your dataset with multiple datasets where the missing values are replaced with random values and then combining the multiple parameter estimates. The MI and MIANALYZE procedures do just that.

hth

PG

PG