PROC MIXED Missing Data

ROLuke91 · Posted 11-15-2019 01:28 PM

Hi all,

I am confused about the way PROC MIXED handles missing data.

I am trying to use PROC MIXED with maximum-likelihood specification to run a multiple linear regression (neither repeated measures nor mixed effects; just a straightforward multiple linear regression). I have significant and varying amounts of missing data across my independent variables. I have a total N of 761, and the missing distribution is as follows:

Var	N	N Miss
Y	323	438
X1	757	4
X2	321	440
X3	321	440
X4	761	0
X5	384	377
X6	547	214
X7	752	9
X8	761	0
X9	319	442
X10	319	442

My understanding, however, is that Proc Mixed is not supposed to be deleting listwise...however, when I run the analysis, it excludes 667 observations (only using 94). Is there any way I can specify PROC MIXED or use any other relevant procedure to run a multiple linear regression that will NOT listwise delete upon encountering a missing for that IV?

Thanks,

Luke

Rick_SAS · Posted 11-15-2019 02:19 PM

I think what you are (mis)remembering is that for repeated measures ANOVA, the MIXED procedure does not perform listwise deletion, unlike the GLM formulation. That is because GLM uses the"wide" data format whereas MIXED uses a "long" data format. But when you have continuous covariates for linear regression, even the MIXED model will delete observations for which a covariate has missing data.

No method will be able to use the 438 observations that has missing response values, so the best you can do is use 323 obs to fit the model.

ROLuke91 · Posted 11-16-2019 09:08 AM

Thanks very much for the clarification, and makes sense.

However, it looks to me as if it’s dropping every observation that is missing 1+ IV value - assuming that the 94 used observations are exclusively observations with NO missing values on any of the IVs.

Is there any way to avoid this listwise deletion?

Rick_SAS · Posted 11-17-2019 06:24 AM

Correct. If an observation has a missing value in any IV, that observation cannot be used to fit the model. That is a mathematical fact. It has nothing to do with SAS or any other software.

Think about fitting a line to the following two-dimensional (X,Y) data:

X Y

0 0

. 1

. 2

Try to plot these points and then determine the line of best fit. It's impossible because only one observation has complete data.

Similarly, the procedure cannot use the observations that are not complete. It's not that SAS doesn't like missing values, it is simply that those observations do not provide any useful information for the fit.

This is why some practitioners perform imputation to reduce the impact of missing data. Imputation is replacing a missing value by a plausible nonmissing value. You can read the article "Mean imputation in SAS" to learn more about simple imputation methods.

ROLuke91 · Posted 11-18-2019 09:43 AM

Thank you very much for all of your help and clarity. I moved on to work with PROC CALIS and FIML, but this was very helpful!

PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Registration is open

PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Re: PROC MIXED Missing Data

Registration is open

SAS Training: Just a Click Away