BookmarkSubscribeRSS Feed
AnalytX
Fluorite | Level 6

Hello everybody,

I have non normal data to analyse using a repeated measures model. I schedule to use PROC MIXED on rank transformed data. Is it a good methodology?

Thanking you in advance for your answers.

7 REPLIES 7
SteveDenham
Jade | Level 19

I have done this. Generate the ranks separately by time point. But, if you are analyzing the ranks using PROC MIXED, beware of unbalanced data.  If you have 20 observations at the first time point, and only 10 at some later time point, there is no way the mean ranks at each time could be equal.  Look up Friedman's Test on Google as an example of repeated measures on ranks.

Remember, the assumption in PROC MIXED about normality applies to the normality of the residuals, not of the data itself.  For example, if you plotted the data and saw a bi-modal distribution, you would assume that the distribution is non-normal.  Drill a little deeper, and it might be that you have an unaccounted for covariate (say gender) that leads to two peaks.  If you can identify the process that generates the errors/residuals, you could specify the distribution and use PROC GLIMMIX.  This is especially the case for errors from various exponential family distributions.

Steve Denham

Kip1
Calcite | Level 5

I want to resurrect this topic if you don't mind. According to SAS documentation  Overview PROC MIXED 

"The primary assumptions underlying the analyses performed by PROC MIXED are as follows: 

  • The data are normally distributed (Gaussian).

  • The means (expected values) of the data are linear in terms of a certain set of parameters.

  • The variances and covariances of the data are in terms of a different set of parameters, and they exhibit a structure matching one of those available in PROC MIXED." Here it states explicitly "the data" not "the residuals".  Anyone care to elaborate?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

There are many issues to consider when analyzing factorials (including repeated measures) using ranks. By definition, the ranks will have unequal variances, and an unstructured covariance matrix.  Full details, including SAS code, can be found in the book:

Brunner, E., Domhof, S., Langer, F. 2002. Nonparametric Analysis of Longitudinal Data in Factorial Experiments. Wiley Publ.

Without taking some precautions in the analysis, you will get grossly inflated type I errors.

SteveDenham
Jade | Level 19

I definitely agree with this, and it is easy to see if you transform all observations to ranks.  However, with equal observations at each time point, and computing ranks separately at each time point, at least some of the concerns regarding unequal variances is addressed (i.e., for 20 observations per time point, the variance of the numbers 1 to 20 is asymptotically constant under a null of no group effect).  I'm not so sure about the covariances, though, and this gives me room to think about some of the analyses we have been doing.  Generally, autoregressive models have yielded lower information criterion values than the unstructured models, given the by-timepoint ranking.

Anyway, I am willing to wager that the OP's original question could be better answered by using GLIMMIX with a proper selection of error distribution.  I have never been a big fan of nonparametrics, going back to a FORTRAN program I wrote as an undergraduate to do Spearman correlation.

Steve Denham

AnalytX
Fluorite | Level 6

Thanks a lot for your answers which give me helpful inormation.

Regards,

plf515
Lapis Lazuli | Level 10

I agree with Steve's idea to use GLIMMIX or possibly NLMIXED, after checking out the residuals from the models generated by MIXED. GLIMMIX has a lot of distributions and with NLMIXED - well, there's almost nothing you can't do. There was an interesting paper at the most recent NESUG on fitting W shaped distributions.

Edgar_Brunner
Calcite | Level 5

Please find some answers to your question in the following papers:

Akritas, M. G., Arnold, S. F. and Brunner, E. (1997). Nonparametric hypotheses and rank statistics for unbalanced factorial designs. Journal of the American  Statistical Association 92, 258-265. 

Brunner, E. and Puri, M.L. (2001). Nonparametric  Methods in Factorial Designs. Statistical Papers 42, 1-52.

Shah, D.A. and Madden, L.V. (2004).  Nonparametric  Analysis of Ordinal Data in Designed Factorial Experiments.  Phytopathology 94, 33-43. Electronic Appendix:  Instructions on the Use of Software and Applications (e-extra).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 8092 views
  • 5 likes
  • 6 in conversation