Hi Friends,
I am doing mutiple imputation for a repeated measurement randomised trial.I plan to ues proc mixed to do that. My question is if I want to assess the model fit, particularly Shapiro-Wlk test for residual normality.
How is the process? Should I run the proc mixed and test the normality test for the combined imputation data sets?
Thank you.
There are no combined tests for normality for data that has already been multiply imputed mentioned in any of the literature as far as I know, so you would need to check it before if you were going to check it. I suppose that if you are comfortable with the MCMC (again assuming that is what you are using) having converged, it would be sufficient to check the normality of the residuals from a single imputation, but again that is more of an intuition than something backed by existing theory.
If you are running a procedure that supports the normality tests, you can just run the normality test for each of the imputed data sets. This will happen automatically if you are using the BY statement to analyze the imputed data.
However, I don't think that PROC MIXED supports an option to run a normality test. Therefore you need to output the residuals manually, You have to specify either the OUTP= option tor the OUTPM= option and include the RESIDUAL option. Here is a link to the doc.
You can then run PROC UNIVARIATE (using a BY statement) on the residuals.
Thank you for the response.
Then what if the normality test outcome is not consistent among the imputed datasets (suppose the cutpoint is 0.01 from Shapiro-Wilk test)? Should I transform the imputed data set before running proc mixed?
Or, should I used the original dataset (the one before imputation) to test for normality and decide whether the data should be transformed (log or rank..)?
Thank you.
What are you trying to do? Are you concerned about the normality of residuals because you are concerned about the assumption of the linear regression model? If so, read this article about the assumptions and misconceptions of linear regression.
> what if the normality test outcome is not consistent among the imputed datasets?
I think you should do the analysis and use the PLOTS= option to create diagnostic plots. If you have a concern about the results, write back and post the results that concern you. Only by seeing the diagnostic plots can we know where to focus attention. If there is a problem, it might be that the model is misspecified, that the data are heteroscedastic, or many other issues.
What method are you using for imputation? If you are using the MCMC method then it assumes your data comes from a multivariate normal distribution which means you would want to make sure of the normality of the data prior to running MI.
If you get convergence in the MCMC then the data that is generated also ought to be multivariate normal so, if you assume MVN at the beginning, then you can check MVN at the end by looking at the plots MI gives for assessing convergence.
There are no combined tests for normality for data that has already been multiply imputed mentioned in any of the literature as far as I know, so you would need to check it before if you were going to check it. I suppose that if you are comfortable with the MCMC (again assuming that is what you are using) having converged, it would be sufficient to check the normality of the residuals from a single imputation, but again that is more of an intuition than something backed by existing theory.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.