11-07-2012 06:25 AM
I have non normal data to analyse using a repeated measures model. I schedule to use PROC MIXED on rank transformed data. Is it a good methodology?
Thanking you in advance for your answers.
11-07-2012 08:44 AM
I have done this. Generate the ranks separately by time point. But, if you are analyzing the ranks using PROC MIXED, beware of unbalanced data. If you have 20 observations at the first time point, and only 10 at some later time point, there is no way the mean ranks at each time could be equal. Look up Friedman's Test on Google as an example of repeated measures on ranks.
Remember, the assumption in PROC MIXED about normality applies to the normality of the residuals, not of the data itself. For example, if you plotted the data and saw a bi-modal distribution, you would assume that the distribution is non-normal. Drill a little deeper, and it might be that you have an unaccounted for covariate (say gender) that leads to two peaks. If you can identify the process that generates the errors/residuals, you could specify the distribution and use PROC GLIMMIX. This is especially the case for errors from various exponential family distributions.
11-07-2012 09:00 AM
There are many issues to consider when analyzing factorials (including repeated measures) using ranks. By definition, the ranks will have unequal variances, and an unstructured covariance matrix. Full details, including SAS code, can be found in the book:
Brunner, E., Domhof, S., Langer, F. 2002. Nonparametric Analysis of Longitudinal Data in Factorial Experiments. Wiley Publ.
Without taking some precautions in the analysis, you will get grossly inflated type I errors.
11-07-2012 09:12 AM
I definitely agree with this, and it is easy to see if you transform all observations to ranks. However, with equal observations at each time point, and computing ranks separately at each time point, at least some of the concerns regarding unequal variances is addressed (i.e., for 20 observations per time point, the variance of the numbers 1 to 20 is asymptotically constant under a null of no group effect). I'm not so sure about the covariances, though, and this gives me room to think about some of the analyses we have been doing. Generally, autoregressive models have yielded lower information criterion values than the unstructured models, given the by-timepoint ranking.
Anyway, I am willing to wager that the OP's original question could be better answered by using GLIMMIX with a proper selection of error distribution. I have never been a big fan of nonparametrics, going back to a FORTRAN program I wrote as an undergraduate to do Spearman correlation.
11-16-2012 06:44 AM
I agree with Steve's idea to use GLIMMIX or possibly NLMIXED, after checking out the residuals from the models generated by MIXED. GLIMMIX has a lot of distributions and with NLMIXED - well, there's almost nothing you can't do. There was an interesting paper at the most recent NESUG on fitting W shaped distributions.
06-01-2015 08:15 AM
Please find some answers to your question in the following papers:
Akritas, M. G., Arnold, S. F. and Brunner, E. (1997). Nonparametric hypotheses and rank statistics for unbalanced factorial designs. Journal of the American Statistical Association 92, 258-265.
Brunner, E. and Puri, M.L. (2001). Nonparametric Methods in Factorial Designs. Statistical Papers 42, 1-52.
Shah, D.A. and Madden, L.V. (2004). Nonparametric Analysis of Ordinal Data in Designed Factorial Experiments. Phytopathology 94, 33-43. Electronic Appendix: Instructions on the Use of Software and Applications (e-extra).