Solved: Re: normality assessment after multiple imputation

superbibi · Posted 02-05-2019 09:19 AM

Hi Friends,

I am doing mutiple imputation for a repeated measurement randomised trial.I plan to ues proc mixed to do that. My question is if I want to assess the model fit, particularly Shapiro-Wlk test for residual normality.

How is the process? Should I run the proc mixed and test the normality test for the combined imputation data sets?

Thank you.

SAS_Rob · Posted 02-07-2019 11:11 AM

There are no combined tests for normality for data that has already been multiply imputed mentioned in any of the literature as far as I know, so you would need to check it before if you were going to check it. I suppose that if you are comfortable with the MCMC (again assuming that is what you are using) having converged, it would be sufficient to check the normality of the residuals from a single imputation, but again that is more of an intuition than something backed by existing theory.

View solution in original post

Rick_SAS · Posted 02-05-2019 11:12 AM

If you are running a procedure that supports the normality tests, you can just run the normality test for each of the imputed data sets. This will happen automatically if you are using the BY statement to analyze the imputed data.

However, I don't think that PROC MIXED supports an option to run a normality test. Therefore you need to output the residuals manually, You have to specify either the OUTP= option tor the OUTPM= option and include the RESIDUAL option. Here is a link to the doc.

You can then run PROC UNIVARIATE (using a BY statement) on the residuals.

superbibi · Posted 02-05-2019 02:54 PM

Thank you for the response.

Then what if the normality test outcome is not consistent among the imputed datasets (suppose the cutpoint is 0.01 from Shapiro-Wilk test)? Should I transform the imputed data set before running proc mixed?

Or, should I used the original dataset (the one before imputation) to test for normality and decide whether the data should be transformed (log or rank..)?

Thank you.

Rick_SAS · Posted 02-05-2019 03:22 PM

What are you trying to do? Are you concerned about the normality of residuals because you are concerned about the assumption of the linear regression model? If so, read this article about the assumptions and misconceptions of linear regression.

> what if the normality test outcome is not consistent among the imputed datasets?

I think you should do the analysis and use the PLOTS= option to create diagnostic plots. If you have a concern about the results, write back and post the results that concern you. Only by seeing the diagnostic plots can we know where to focus attention. If there is a problem, it might be that the model is misspecified, that the data are heteroscedastic, or many other issues.

SAS_Rob · Posted 02-06-2019 08:44 AM

What method are you using for imputation? If you are using the MCMC method then it assumes your data comes from a multivariate normal distribution which means you would want to make sure of the normality of the data prior to running MI.

If you get convergence in the MCMC then the data that is generated also ought to be multivariate normal so, if you assume MVN at the beginning, then you can check MVN at the end by looking at the plots MI gives for assessing convergence.

superbibi · Posted 02-07-2019 10:47 AM

Thank you for the response. May I ask if I want to check the normality of residuals with Shapiro-Wilk test, do I need to check if before imputation or after imputation. If after imputation, how can I do it?

SAS_Rob · Posted 02-07-2019 11:11 AM

There are no combined tests for normality for data that has already been multiply imputed mentioned in any of the literature as far as I know, so you would need to check it before if you were going to check it. I suppose that if you are comfortable with the MCMC (again assuming that is what you are using) having converged, it would be sufficient to check the normality of the residuals from a single imputation, but again that is more of an intuition than something backed by existing theory.

Catch up on SAS Innovate 2026