BookmarkSubscribeRSS Feed
stan
Quartz | Level 8

Hi,

my design is like:

SubjectCondition1Condition2Condition3Condition4Condition5
1

replicate1

replicate2

replicate3

...

replicate15

replicate1

replicate2

replicate3

replicate1

replicate2

replicate3

replicate1

.

.

replicate1

replicate2

replicate3

2

replicate1

replicate2

replicate3

...

replicate15

replicate1

replicate2

.

replicate1

replicate2

replicate3

replicate1

replicate2

replicate3

replicate1

replicate2

replicate3

3

replicate1

replicate2

replicate3

...

replicate12

replicate1

replicate2

replicate3

replicate1

replicate2

replicate3

replicate1

replicate2

.

replicate1

replicate2

replicate3

...
n

Under the "replicate" I mean what is mentioned here and here.

For most subjects/conditions I have 3 replicated, for some ---- only 2 or even 1 (because of outliers).

Each subject's parameter of interest was measured in 5 conditions (not times).

For the similar design but with one "replicate" only I was advised by the following code:

PROC GLIMMIX DATA = ff_long_sorted ORDER = DATA MAXOPT = 500 PCONV = 1E-8;

  VALUEp = VALUEE/100;

  CLASS ExpID Condition;

  MODEL VALUEp = Condition / DISTRIBUTION = BINOMIAL DDFM = KENWARDROGER;

  RANDOM Condition / RESIDUAL SUBJECT = ExpID TYPE = CSH;

  *RANDOM _RESIDUAL_ / SUBJECT = ExpID TYPE = CSH;

  NLOPTIONS TECHNIQUE = NMSIMP MAXITER = 500;

  LSMEANS Condition / ADJDFE = ROW DIFF ILINK ADJUST = TUKEY CL

                      PLOTS = DIFFOGRAM(NOABS CENTER);

  ODS SELECT ConvergenceStatus FitStatistics Tests3 DiffPlot;

RUN;

But what about the situation when I have 1-3 replicates per subject/condition?

Thank you in advance.

4 REPLIES 4
SteveDenham
Jade | Level 19

Here replicates provide an additional source of variability, and are a within-subject source.  Thus modify the code to:

PROC GLIMMIX DATA = ff_long_sorted ORDER = DATA MAXOPT = 500 PCONV = 1E-8;

  VALUEp = VALUEE/100;

  CLASS ExpID Condition Replicate;

  MODEL VALUEp = Condition / DISTRIBUTION = BINOMIAL DDFM = KENWARDROGER;

  RANDOM Condition / RESIDUAL SUBJECT = ExpID TYPE = CSH;

  RANDOM Replicate/ SUBJECT = ExpID*Condition;

  NLOPTIONS TECHNIQUE = NMSIMP MAXITER = 500;

  LSMEANS Condition / ADJDFE = ROW DIFF ILINK ADJUST = TUKEY CL

                      PLOTS = DIFFOGRAM(NOABS CENTER);

  ODS SELECT ConvergenceStatus FitStatistics Tests3 DiffPlot;

RUN;

You may want to change to method=laplace to get conditional estimates, rather than the marginals which are known to be biased.  That code would look like:

PROC GLIMMIX DATA = ff_long_sorted ORDER = DATA method=laplace;

  VALUEp = VALUEE/100;

  CLASS ExpID Condition Replicate;

  MODEL VALUEp = Condition / DISTRIBUTION = BINOMIAL;

  RANDOM Condition / SUBJECT = ExpID TYPE = CSH;

  RANDOM Replicate/ SUBJECT = ExpID*Condition;

  NLOPTIONS TECHNIQUE = NMSIMP MAXITER = 500;

  LSMEANS Condition / ADJDFE = ROW DIFF ILINK ADJUST = simulate CL

                      PLOTS = DIFFOGRAM(NOABS CENTER);

  ODS SELECT ConvergenceStatus FitStatistics Tests3 DiffPlot;

RUN;

I also moved to a different adjustment (Edwards and Berry's simulation method as opposed to Tukey) as it provides better control of experiment-wise error rates.

Steve Denham

stan
Quartz | Level 8

Steve,

following the idea of modelling repeated measures data in REplicates I've simulated log-normally

distributed data (with known arithmetic mean and SD) [1] and tried to implement your suggestions.

The data set and code are attached.

I changed distribution to lognormal as there is some evidence in the literature for that ([2]). I (naively)

guess than CSH is an adequate variance-covariance matrix type as different conditions may cause

different dispersion/variance. (Right?) The number of experiments (four) was chosen as we usually

have 3-5 experiments.

I checked different optimization techniques in the NLOPTIONS statement. With the simulated data

and the code I get strange output ---- negative values in the "Fit Statistics", strange numbers in the

"Fit Statistics for Conditional Distribution", empty cells in the "Covariance Parameter Estimates",

"0.0" values in "Pearson Chi-Square / DF" and large F-values. Sometimes the SAS System stopped

processing because of errors, or optimizations cannot be completed, etc.

To have a balanced data set I also tried only TRIplicates in my dependent variable. But with no

success. As well as for monoplicates (introduced in the PROC GLIMMIX as the means for REplicates).

How to handle this kind of data sets?

Sincerely,

Stan




-----------------

P.S.

References:

[1] thanks to 's post/replies at

How to generate random numbers in SAS - The DO Loop

[2]

<1> "The logarithmic transformation and the geometric mean in reporting experimental

IgE results..."

http://www.annallergy.org/article/S1081-1206(10)60595-9/abstract

2\ Figure_S1.tif at

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0046423

3\ "Cytokine data were log-normally distributed. The values were therefore expressed

as geometric means ± standard errors of the means"

http://iai.asm.org/content/73/6/3462.full

(4\ http://www.biomedcentral.com/1471-2172/8/27)

5\ "Statistical analyses were performed using SAS 9.1.3 software (SAS Institute Inc.,

Cary, NC, USA). Cytokine data were log-transformed due to the non-normal distribution

of plasma cytokines"

http://arthritis-research.com/content/11/5/r147

6\ "Because cytokine and chemokine data showed skewing from the normal distribution,

statistical analyses were completed after logarithmic (base 10) transformation of

data, which established a normal distribution. ... Values of zero were converted to 1

before logarithmic transformation for statistical analysis. Data are presented in the

figures and tables as the mean±SE of the log10 values of individual cytokines and

chemokines or of their ratios. To enable comparisons with other studies, we also

provide the geometric mean values after transformation back from the log10 value"

http://jid.oxfordjournals.org/content/184/4/393.long#sec-1

SteveDenham
Jade | Level 19

One thing that is important to note is that for the lognormal distribution the mean and variance are functionally independent.  Given that, I would move back to the pseudo-likelihood method, and try (untested):

TITLE "----- GLIMMIX for REplicates -----";
PROC GLIMMIX DATA = REplicates ORDER = DATA;
CLASS EXP CONDITION REPLICATA;
MODEL VALUEE = CONDITION / DISTRIBUTION = LOGNORMAL;            
RANDOM CONDITION / SUBJECT = EXP;/* TYPE = CSH;  For this, I would fit a simpler variance component only model */             
RANDOM REPLICATA / residual type=ar(1) SUBJECT = EXP*CONDITION ; /* Fit marginal model, with AR(1) for repeated factor*/
NLOPTIONS MAXITER = 2000;
  /* if TECHNIQUE =
  DBLDOG,NMSIMP,NEWRAP,NRRIDG then optimizations cannot be completed
  NONE,QUANEW,CONGRA,QUANEW then empty cells | negatives in "Fit Statistics" | "0.0" value for the "Pearson Chi-Square / DF".
  LEVMAR then the SAS System stopped processing because of errors.
  */
LSMEANS CONDITION / DIFF ILINK ADJUST = SIMULATE CL PLOTS = DIFFOGRAM(NOABS CENTER);
RUN;

I would apply the same model for triplicates.  Note that ILINK will still report on the log scale, with dist=lognormal.  You can get geometric means by using the EXP option, or you can get backtransformed least squares means on the original scale using the formulas in the documentations (search for the omega symbol in the DIST= option material).

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2042 views
  • 8 likes
  • 3 in conversation