BookmarkSubscribeRSS Feed
ROLuke91
Obsidian | Level 7

Hi all,

 

I am confused about the way PROC MIXED handles missing data.

 

I am trying to use PROC MIXED with maximum-likelihood specification to run a multiple linear regression (neither repeated measures nor mixed effects; just a straightforward multiple linear regression). I have significant and varying amounts of missing data across my independent variables. I have a total N of 761, and the missing distribution is as follows:

 

VarNN Miss
Y323438
X17574
X2321440
X3321440
X47610
X5384377
X6547214
X77529
X87610
X9319442
X10319442

 

My understanding, however, is that Proc Mixed is not supposed to be deleting listwise...however, when I run the analysis, it excludes 667 observations (only using 94). Is there any way I can specify PROC MIXED or use any other relevant procedure to run a multiple linear regression that will NOT listwise delete upon encountering a missing for that IV?

 

Thanks,

Luke

4 REPLIES 4
Rick_SAS
SAS Super FREQ

I think what you are (mis)remembering is that for repeated measures ANOVA, the MIXED procedure does not perform listwise deletion, unlike the GLM formulation. That is because GLM uses the"wide" data format whereas MIXED uses a "long" data format. But when you have continuous covariates for  linear regression, even the MIXED model will delete observations for which a covariate has missing data.

 

No method will be able to use the 438 observations that has missing response values, so the best you can do is use 323 obs to fit the model.

ROLuke91
Obsidian | Level 7
Thanks very much for the clarification, and makes sense.

However, it looks to me as if it’s dropping every observation that is missing 1+ IV value - assuming that the 94 used observations are exclusively observations with NO missing values on any of the IVs.

Is there any way to avoid this listwise deletion?
Rick_SAS
SAS Super FREQ

Correct. If an observation has a missing value in any IV, that observation cannot be used to fit the model. That is a mathematical fact. It has nothing to do with SAS or any other software.

 

Think about fitting a line to the following two-dimensional (X,Y) data:

X Y

0  0

.  1

.  2

 

Try to plot these points and then determine the line of best fit. It's impossible because only one observation has complete data.

 

Similarly, the procedure cannot use the observations that are not complete. It's not that SAS doesn't like missing values, it is simply that those observations do not provide any useful information for the fit.

 

This is why some practitioners perform imputation to reduce the impact of missing data. Imputation is replacing a missing value by a plausible nonmissing value. You can read the article "Mean imputation in SAS" to learn more about simple imputation methods.

ROLuke91
Obsidian | Level 7

Thank you very much for all of your help and clarity. I moved on to work with PROC CALIS and FIML, but this was very helpful!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 9334 views
  • 3 likes
  • 2 in conversation