Hi,
my design is like:
Subject | Condition1 | Condition2 | Condition3 | Condition4 | Condition5 |
---|---|---|---|---|---|
1 | replicate1 replicate2 replicate3 ... replicate15 | replicate1 replicate2 replicate3 | replicate1 replicate2 replicate3 | replicate1 . . | replicate1 replicate2 replicate3 |
2 | replicate1 replicate2 replicate3 ... replicate15 | replicate1 replicate2 . | replicate1 replicate2 replicate3 | replicate1 replicate2 replicate3 | replicate1 replicate2 replicate3 |
3 | replicate1 replicate2 replicate3 ... replicate12 | replicate1 replicate2 replicate3 | replicate1 replicate2 replicate3 | replicate1 replicate2 . | replicate1 replicate2 replicate3 |
... | |||||
n |
Under the "replicate" I mean what is mentioned here and here.
For most subjects/conditions I have 3 replicated, for some ---- only 2 or even 1 (because of outliers).
Each subject's parameter of interest was measured in 5 conditions (not times).
For the similar design but with one "replicate" only I was advised by the following code:
PROC GLIMMIX DATA = ff_long_sorted ORDER = DATA MAXOPT = 500 PCONV = 1E-8;
VALUEp = VALUEE/100;
CLASS ExpID Condition;
MODEL VALUEp = Condition / DISTRIBUTION = BINOMIAL DDFM = KENWARDROGER;
RANDOM Condition / RESIDUAL SUBJECT = ExpID TYPE = CSH;
*RANDOM _RESIDUAL_ / SUBJECT = ExpID TYPE = CSH;
NLOPTIONS TECHNIQUE = NMSIMP MAXITER = 500;
LSMEANS Condition / ADJDFE = ROW DIFF ILINK ADJUST = TUKEY CL
PLOTS = DIFFOGRAM(NOABS CENTER);
ODS SELECT ConvergenceStatus FitStatistics Tests3 DiffPlot;
RUN;
But what about the situation when I have 1-3 replicates per subject/condition?
Thank you in advance.
Here replicates provide an additional source of variability, and are a within-subject source. Thus modify the code to:
PROC GLIMMIX DATA = ff_long_sorted ORDER = DATA MAXOPT = 500 PCONV = 1E-8;
VALUEp = VALUEE/100;
CLASS ExpID Condition Replicate;
MODEL VALUEp = Condition / DISTRIBUTION = BINOMIAL DDFM = KENWARDROGER;
RANDOM Condition / RESIDUAL SUBJECT = ExpID TYPE = CSH;
RANDOM Replicate/ SUBJECT = ExpID*Condition;
NLOPTIONS TECHNIQUE = NMSIMP MAXITER = 500;
LSMEANS Condition / ADJDFE = ROW DIFF ILINK ADJUST = TUKEY CL
PLOTS = DIFFOGRAM(NOABS CENTER);
ODS SELECT ConvergenceStatus FitStatistics Tests3 DiffPlot;
RUN;
You may want to change to method=laplace to get conditional estimates, rather than the marginals which are known to be biased. That code would look like:
PROC GLIMMIX DATA = ff_long_sorted ORDER = DATA method=laplace;
VALUEp = VALUEE/100;
CLASS ExpID Condition Replicate;
MODEL VALUEp = Condition / DISTRIBUTION = BINOMIAL;
RANDOM Condition / SUBJECT = ExpID TYPE = CSH;
RANDOM Replicate/ SUBJECT = ExpID*Condition;
NLOPTIONS TECHNIQUE = NMSIMP MAXITER = 500;
LSMEANS Condition / ADJDFE = ROW DIFF ILINK ADJUST = simulate CL
PLOTS = DIFFOGRAM(NOABS CENTER);
ODS SELECT ConvergenceStatus FitStatistics Tests3 DiffPlot;
RUN;
I also moved to a different adjustment (Edwards and Berry's simulation method as opposed to Tukey) as it provides better control of experiment-wise error rates.
Steve Denham
Steve,
following the idea of modelling repeated measures data in REplicates I've simulated log-normally
distributed data (with known arithmetic mean and SD) [1] and tried to implement your suggestions.
The data set and code are attached.
I changed distribution to lognormal as there is some evidence in the literature for that ([2]). I (naively)
guess than CSH is an adequate variance-covariance matrix type as different conditions may cause
different dispersion/variance. (Right?) The number of experiments (four) was chosen as we usually
have 3-5 experiments.
I checked different optimization techniques in the NLOPTIONS statement. With the simulated data
and the code I get strange output ---- negative values in the "Fit Statistics", strange numbers in the
"Fit Statistics for Conditional Distribution", empty cells in the "Covariance Parameter Estimates",
"0.0" values in "Pearson Chi-Square / DF" and large F-values. Sometimes the SAS System stopped
processing because of errors, or optimizations cannot be completed, etc.
To have a balanced data set I also tried only TRIplicates in my dependent variable. But with no
success. As well as for monoplicates (introduced in the PROC GLIMMIX as the means for REplicates).
How to handle this kind of data sets?
Sincerely,
Stan
-----------------
P.S.
References:
[1] thanks to 's post/replies at
How to generate random numbers in SAS - The DO Loop
[2]
<1> "The logarithmic transformation and the geometric mean in reporting experimental
IgE results..."
http://www.annallergy.org/article/S1081-1206(10)60595-9/abstract
2\ Figure_S1.tif at
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0046423
3\ "Cytokine data were log-normally distributed. The values were therefore expressed
as geometric means ± standard errors of the means"
http://iai.asm.org/content/73/6/3462.full
(4\ http://www.biomedcentral.com/1471-2172/8/27)
5\ "Statistical analyses were performed using SAS 9.1.3 software (SAS Institute Inc.,
Cary, NC, USA). Cytokine data were log-transformed due to the non-normal distribution
of plasma cytokines"
http://arthritis-research.com/content/11/5/r147
6\ "Because cytokine and chemokine data showed skewing from the normal distribution,
statistical analyses were completed after logarithmic (base 10) transformation of
data, which established a normal distribution. ... Values of zero were converted to 1
before logarithmic transformation for statistical analysis. Data are presented in the
figures and tables as the mean±SE of the log10 values of individual cytokines and
chemokines or of their ratios. To enable comparisons with other studies, we also
provide the geometric mean values after transformation back from the log10 value"
One thing that is important to note is that for the lognormal distribution the mean and variance are functionally independent. Given that, I would move back to the pseudo-likelihood method, and try (untested):
TITLE "----- GLIMMIX for REplicates -----";
PROC GLIMMIX DATA = REplicates ORDER = DATA;
CLASS EXP CONDITION REPLICATA;
MODEL VALUEE = CONDITION / DISTRIBUTION = LOGNORMAL;
RANDOM CONDITION / SUBJECT = EXP;/* TYPE = CSH; For this, I would fit a simpler variance component only model */
RANDOM REPLICATA / residual type=ar(1) SUBJECT = EXP*CONDITION ; /* Fit marginal model, with AR(1) for repeated factor*/
NLOPTIONS MAXITER = 2000;
/* if TECHNIQUE =
DBLDOG,NMSIMP,NEWRAP,NRRIDG then optimizations cannot be completed
NONE,QUANEW,CONGRA,QUANEW then empty cells | negatives in "Fit Statistics" | "0.0" value for the "Pearson Chi-Square / DF".
LEVMAR then the SAS System stopped processing because of errors.
*/
LSMEANS CONDITION / DIFF ILINK ADJUST = SIMULATE CL PLOTS = DIFFOGRAM(NOABS CENTER);
RUN;
I would apply the same model for triplicates. Note that ILINK will still report on the log scale, with dist=lognormal. You can get geometric means by using the EXP option, or you can get backtransformed least squares means on the original scale using the formulas in the documentations (search for the omega symbol in the DIST= option material).
Steve Denham
Stan,
You might be interested in this blog post that I wrote that is based on our discussion: Simulate lognormal data with specified mean and variance - The DO Loop
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.