06-14-2012 04:47 AM
I have one within-subjects (fixed) factor --- condition (there is no a between-subjects factor).
Specifically, I test immunomodulatory activities of different substances added
to immune cell culture. So, I measure response in different conditions (not time).
The experimental units (donors of the lymphocytes) are treated as random factor.
So the general data set could be presented as:
Subject | Condition | ||||
1 | j | t | |||
1 | y_{m}_{11} | y_{m}_{1j} | y_{m}_{1t} | ||
i | y_{mi}_{1} | y_{mij} | y_{mit} | ||
n | y_{mn}_{1} | y_{mnj} | y_{mnt} |
My experiment is described here.
Please help to choose.
JUNE 02, 2013 UPDATE:
Interestingly, the substances analyzed share molecular structure to some extent.
Can it affect the model? I mean that in my experiment I definitely know that
the substances are derivatives, so they resemble each other partially.
Can/Should it be reflected in the model parameters (i.e. variance-covariance matrix)
vs. the situation when absolutely distinct molecules are tested ?
AUGUST 06, 2013 UPDATE:
Continues and the correct answer by Steve Denham is here.
06-14-2012 11:16 AM
Yes. Unfortunately/fortunately there are several ways to set it up. Also choices to make such as whether the repeated measures share a variance or each have their own.There will be several combinations of the RANDOM and REPEATED statement, in combination with SUBJECT= and possibly GROUP- options. Take a look at: http://www.stat.ncsu.edu/people/arellano/courses/ST524/Fall08/Homeworks/Homework7/articles/188-29_Re...
06-14-2012 02:50 PM
thanks, deb193, but the question is still open...
06-14-2012 03:12 PM
Several published methods articles suggest that random coefficient models are superior to RMANOVA models because of fewer assumptions, better handling of imbalance, and even missing data handling.
The examples I provided show that a between-subjects factor is not required.
I am unclear what remains unanswered. If I am missing the point, please rephrase the question or provide a more specific followup question.
06-24-2012 03:34 AM
deb193, thank you for your response. I've rephrased my question. I meant which of the two procedures is the best (powerful/robust). I'm studying the article you provided.
06-28-2012 09:26 AM
I think ML models (aka mixed, aka random coefficient, aka hierarchical) will generally have more power. The citation below reports simulation work on power in some circumstances.
Also, having conditions instead of time is not a problem. Just make sure that condition enters the model as discrete instead of continuous. In SAS you declare ti with CLASS statement. Other languages (e.g., Stata) use notation in the model command. You also need to be careful to determine how mixed model coefficents map on to ANOVA output. In some cases follow up contrasts of LSMeans (i.e., marginal) are needed to reproduce some ANOVA F-tests when there are more than two levels of a factor.
Quene, H. and H. van den Bergh (2004). "On multi-level modeling of data from repeated measures designs: a tutorial." Speech Communication 43(1-2): 103-121. (PMCID:
Data from repeated measures experiments are usually analyzed with conventional ANOVA. Three well-known problems with ANOVA are the sphericity assumption, the design effect (sampling hierarchy), and the requirement for complete designs and data sets. This tutorial explains and demonstrates multi-level modeling (MLM) as an alternative analysis tool for repeated measures data. MLM allows us to estimate variance and covariance components explicitly. MLM does not require sphericity, it takes the sampling hierarchy into account, and it is capable of analyzing incomplete data. A fictitious data set is analyzed with MLM and ANOVA, and analysis results are compared. Moreover, existing data from a repeated measures design are re-analyzed with MLM, to demonstrate its advantages. Monte Carlo simulations suggest that MLM yields higher power than ANOVA, in particular under realistic circumstances. Although technically complex, MLM is recommended as a useful tool for analyzing repeated measures data from speech research. (C) 2004 Elsevier B.V. All rights reserved.
06-28-2012 01:09 PM
deb193, thanks for your hint about the CLASS statement. As far as I understand I do as said:
PROC MIXED DATA = my_data_in_a_long_format;
CLASS IDN SUBSTANCE;
MODEL VALUEE = SUBSTANCE;
REPEATED /SUBJECT = IDN TYPE = UN R RCORR;
LSMEANS GROUP / ADJUST = TUKEY CL;
RUN;
The topic on MLM seems beyond me.... But many thanks for the paper !
06-14-2012 03:12 PM
Actually, GLM is typically not the best program to use for repeated measures (within-subjects) designs, unless you can assume compound symmetry. Compound symmetry will assume that observations taken at week 1 and 2 will correlate the same as between weeks 1 and 52. This is invariably false with all the data I've seen. Proc Mixed will handle this. The general issue is that errors are correlated, a major assumption in linear models. [That is, one can ignore non-normality, heteroscedasticity, but even a moderate autocorrelation (e.g., 0.30) will turn an alpha of 0.05 to 0.25, according to Scheffe.] My favorite way to handle this is to assume an AR(1) model for the autocorrelated data or, if you have large N and few time points, allow each pair of times to have unique autocorrelations.
06-24-2012 04:07 AM
AllenFleishman, thank you for the response. Probably I can and connot assume compound symmetry as for some data sets the "Convergence criteria met but final Hessian is not positive definite" notification appears. For other data sets I get "Convergence criteria met.".
08-09-2012 12:15 PM
I use Type=UN (uncorrelated) when the N is large and the number of time points is low. In essence you are generating a parameter for every pair of time points (t*(t+1)/2, where t is the number of times). I still suggest Type=AR(1) when you have moderate N and a reasonable number of equally spaced time points. If the time points are irregular, then I would suggest Type=POW(SP), which will be identical to AR(1) when the distances are all equal. The idea behind AR(1) is that the autocorrelation between any two time points exponentially decreases over time. From my experience this gives a useful model ("No model is true, but some are useful" - G. Box). The AR(1) and the CS (compound symmetry or sphericity) both use the same number of parameters = 2. But if you have the df to spare, use UN. For example, with 5 times of: baseline, Weeks 2, 4, 6, and 12, you would need 15 parameters for UN but 2 with SP(POW).
Note: the last time point is 6 weeks different from the previous, whereas the others are all 2 weeks apart. You would need a c-list of 0, 2, 4, 6, and 12. The week 6 and 12 correlation should be similar to the baseline and week 6 correlation, which in turn are lower than the week 2 v 4 or 4 v 6 correlations. For example if the correlation between the 0 and 2 (or 2 and 4; or 4 and 6) were 0,80, then the correlation between the 0 and 6 or 6 and 12, would be 0.8^3 or 0.512.
The underlying model for AR(1) is that the value of an observation i at time t can be predicted by a linear function of its value at time t-1. That is, X{i,t} = a + b*X{i, t-1} + e{i,t}.