BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasnewbie12
Obsidian | Level 7

Dear all,

 

I am running a multivariate logistic regression model assessing for occurrence of dependant event X (0-didn't occur 1-occurred) with about 200,00 weighted observations using survey data. 

 

I have multiple independant variables, (both continuous and a few categorical). 

Many of the independant variables are missing in about 5% of the data observations (the same 5% of observations are missing data). 

 

Will this make my conclusion inaccurate? I am not sure of how this will effect the overall model, and how it will effect the variables that do not have missing information.  

 

Please clarify if I can run the analysis, or will it cause a major issue. I would prefer to include those 5% of observations if possible, because they have information for some variables.

 

Thank You

1 ACCEPTED SOLUTION

Accepted Solutions
stat_sas
Ammonite | Level 13

Hi,

 

By default SAS will exclude observation if it has even a single variable with missing value. Model excluding 5% missing observations will not make a significant shift in conclusions if problem is related to draw inferences only. If your project is related to predictive modeling that requires scoring new data sets then you need to impute missing variables. Please look into proc stdize that provides various imputation options.

View solution in original post

4 REPLIES 4
stat_sas
Ammonite | Level 13

Hi,

 

By default SAS will exclude observation if it has even a single variable with missing value. Model excluding 5% missing observations will not make a significant shift in conclusions if problem is related to draw inferences only. If your project is related to predictive modeling that requires scoring new data sets then you need to impute missing variables. Please look into proc stdize that provides various imputation options.

sasnewbie12
Obsidian | Level 7

Just want to make sure, will observations that are missing a value for a variable that is not included in your model as a independant or dependant variable also be excluded?

 

 

PeterClemmensen
Tourmaline | Level 20

No. Only for the variables included in your model. For example, this regression 

 

data class;
   set sashelp.class;
   if _n_<5 then age=.;
run;

proc glm data=class;
   model height=weight;
run;quit;

uses all 19 observations in the data set even though age contains missing values for 5 observations. But since age is not included in your model, the observations are not excluded. 

 

This regression however

 

data class1;
   set sashelp.class;
   if _n_<5 then weight=.;
run;

proc glm data=class1;
   model height=weight;
run;quit;

uses only 15 observations because weight contains missing values for 5 observations and weight is included in your model.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1080 views
  • 6 likes
  • 4 in conversation