Hi all,
I am confused about the way PROC MIXED handles missing data.
I am trying to use PROC MIXED with maximum-likelihood specification to run a multiple linear regression (neither repeated measures nor mixed effects; just a straightforward multiple linear regression). I have significant and varying amounts of missing data across my independent variables. I have a total N of 761, and the missing distribution is as follows:
Var | N | N Miss |
Y | 323 | 438 |
X1 | 757 | 4 |
X2 | 321 | 440 |
X3 | 321 | 440 |
X4 | 761 | 0 |
X5 | 384 | 377 |
X6 | 547 | 214 |
X7 | 752 | 9 |
X8 | 761 | 0 |
X9 | 319 | 442 |
X10 | 319 | 442 |
My understanding, however, is that Proc Mixed is not supposed to be deleting listwise...however, when I run the analysis, it excludes 667 observations (only using 94). Is there any way I can specify PROC MIXED or use any other relevant procedure to run a multiple linear regression that will NOT listwise delete upon encountering a missing for that IV?
Thanks,
Luke
I think what you are (mis)remembering is that for repeated measures ANOVA, the MIXED procedure does not perform listwise deletion, unlike the GLM formulation. That is because GLM uses the"wide" data format whereas MIXED uses a "long" data format. But when you have continuous covariates for linear regression, even the MIXED model will delete observations for which a covariate has missing data.
No method will be able to use the 438 observations that has missing response values, so the best you can do is use 323 obs to fit the model.
Correct. If an observation has a missing value in any IV, that observation cannot be used to fit the model. That is a mathematical fact. It has nothing to do with SAS or any other software.
Think about fitting a line to the following two-dimensional (X,Y) data:
X Y
0 0
. 1
. 2
Try to plot these points and then determine the line of best fit. It's impossible because only one observation has complete data.
Similarly, the procedure cannot use the observations that are not complete. It's not that SAS doesn't like missing values, it is simply that those observations do not provide any useful information for the fit.
This is why some practitioners perform imputation to reduce the impact of missing data. Imputation is replacing a missing value by a plausible nonmissing value. You can read the article "Mean imputation in SAS" to learn more about simple imputation methods.
Thank you very much for all of your help and clarity. I moved on to work with PROC CALIS and FIML, but this was very helpful!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.