BookmarkSubscribeRSS Feed
BlueNose
Quartz | Level 8

Hello all,

I am trying to analyze a data which was collected in a trial. The design of the trial is bad in my opinion, but it is way too late and the data is already gathered. To make story short, this is the situation:

A new kind of surgical stitches was invented, and was compared to 2 existing kinds of stitches, for convenience, I will call these groups from now on "New", "Control 1" and "Control 2".

A few patients were enrolled, needing to be stitched up, in 1 or more places. So for every patients, there are 1 or more observations. Each stitch in each person, was done using of of the 3 kinds. For some patients, all stitches were the same kind, however, for some patients, there were different kinds being used. My experimental unit is the patient, and the observational unit is the "gaps" needing stitching up (I hope I ain't mixing experimental and observational).

In addition to this whole mess, there are two different stitching techniques, A and B. A look at the data reveals that each patient was treated with either A or B, in all stitching, regardless of the stitch kind (new, control 1 or control 2). It is possible that there are other techniques not being used here, it is even most likely, but is this factor random or fixed ? I am not sure, I'll get back to it later.

The response variable is the time it took until the bleeding stopped. It is recorded in minutes, given values 0,0.5,1,1.5,....with all the inaccuracies involved.

Here is an illustration of the data (with faked data):

Stitch IDSubject IDTreatmentTechniqueTime
11NewA0
21Control 1A1
32NewB3
43NewA6
53NewA3.5
63Control 1A4.5
74NewB2
84Control 2B1.5

I tried fitting a mixed model in several ways, these are my attempts:

proc mixed data = Data;

  class Treatment SubjectID;

  model Time = Treatment;

  random SubjectID;

  lsmeans Treatment;

run;

Rational: The subjects are the blocks, ignoring technique as if it ain't there (clinicians claimed it is not important, I wouldn't bet on it !)

Treatment is statistically significant

AIC=250, BIC=253

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

proc mixed data = Data;

  class Treatment SubjectID Technique;

  model Time = Treatment;

  random Technique;

  random SubjectID(Technique);

  lsmeans Treatment / cl pdiff=control('New') adjust=dunnett;

run;

Rational: The subjects are the blocks, but since each subject had only one kind of technique, I imagined two blocks of technique (two rectangles), in each one patients (smaller rectangles), and in each one of these, stitches, each one could be different kind

in other words, subjects are nested within technique

Treatment is statistically significant

AIC=249, BIC=245

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

proc mixed data = Data;

  class Treatment SubjectID Technique;

  model Time = Treatment Technique;

  random SubjectID(Technique);

  lsmeans Treatment / cl pdiff=control('New') adjust=dunnett;

run;

Rational: Maybe there are differences due to technique. Maybe the fact that each patients had one and only technique doesn't mean it has to be a blocking factor ?

I used it as fixed effect, and surprise surprise, it was statistically significant ! Treatment was still strongly statistically significant.

AIC=244 BIC=246

My questions are:

1. Is my syntax correct ? How do I know if to take technique as a random effect or fixed effect ? How do I choose the best model ? Am I correct to choose the Dunnett adjustment ?

2. I tried adding a plot to my lsmeans statement, I wanted a means plot with either standard error or confidence interval, or maybe even a box plot. But for some reason, despite reading in the SAS documentation that this is possible, the keywords plot or plots did not appear blue, and I got an error message (expecting one of the following....). Any ideas why ?

example: lsmeans Treatment / cl pdiff=control('New') adjust=dunnett plot=means;

3. Something basic: When we have a single sample, a correlation between observations will change the variance, the standard error and the CI, but it will not affect the means. So why are the adjusted means (lsmeans) differ from the descriptive means I calculate by treatment without specifying subjectID ? (using proc means or univariate)


Thank you in advance !


Edit: I want to add and say, that I just tried one more thing and that is to add the interaction between Treatment and Technique, and it is statistically significant. The AIC is now down to 223 and BIC to 225. In case you'll tell me that THIS is now the preferred model, how do I interpret the interaction lsmeans ? By the way, now the lsmeans of the Treatment groups are MUCH closer to the "descriptive ones" (not taking into account the subjectID).

15 REPLIES 15
SteveDenham
Jade | Level 19

And the edit says it all, to me.  The model fit there describes the experimental design most closely--two fixed effects (treatment, technique) and their interaction, with the subject as a random effect.  It seems a split-design with unequal replication.  Now, as to interpreting the lsmeans from the interaction, I would look at the so-called 'simple effects', comparing treatments at each level of technique.  Not sure what version you are working with, but running this in GLIMMIX means you could use the SLICEDIFF and SLICEDIFF type options in the LSMEANS statement to get these tests.

Steve Denham

BlueNose
Quartz | Level 8

Thank you Steve.

I agree that the last model looks good, I ran it also in JMP, where the R2 and adj R2 are reported and it's impressive (0.8). However JMP gave me different AIC and BIC, while all other output number identical...

I have a dilemma regarding the interpretation. When I asked SAS for lsemans, it gave me for "Treatment": New: 0.2 ; Control 1: 2.6 and Control 2: 3.9 - which is good and what I was hoping for. But, when I look at the interactions, I get the following graph (from JMP):

Capture.PNG

And when I asked SAS to do pdiff with TUKEY adjustment, the difference of the middle group (the new treatment) was not significant in compare to the red technique of control 1. So what is the conclusion in such a case, are there significant differences between new treatment and control 1 or not ?

My code was:

proc mixed data = Data;

  class Treatment SubjectID Technique;

  model Time = Treatment Technique Treatment*Technique;

  random SubjectID;

  lsmeans Treatment / cl pdiff=control('New') adjust=dunnett;

  lsmeans Treatment*Technique / cl pdiff=all adjust=TUKEY;

run;

I still don't understand why the plots= doesn't work. How can I generate such a graph of lsmeans like JMP does ?

I use SAS 9.4, I think GLIMMIX should work here. What is the syntax for SLICEDIFF which does what you refer to ?

Thank you again !

SteveDenham
Jade | Level 19

To get plots, make sure ODS graphics are enabled.  The documentation lists all sorts of plots, so I will leave that for now.  Just looked at the documentation for GLIMMIX, and it looks like it all can be coded up in the LSMEANS statement

The syntax for the slicediffs and plots would look like:

lsmeans treatment*technique/slicediff=technique slicedifftype=control adjust=dunnett plot=meanplot(sliceby=technique join);

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You cannot use AIC or BIC to compare models with different fixed effects. If you are using REML (the default), comparisons of AIC or BIC are used only for models with different random effects. You asked many broad questions in your original posting. But the answers would take entire chapters. You should read the book SAS for Mixed Models, 2nd edtiion (2006), by Littell et al.

SteveDenham
Jade | Level 19

, what am I missing here?  I was under the impression that one could compare models with different fixed effects, provided that the estimation method was ML and not REML, and the data were identical (emphasis added).  The last part is what blows up GLIMMIX comparisons under pseudo-likelihood.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Yes, but the person is using REML in their code (no ML option specified).

SteveDenham
Jade | Level 19

In the words of that most existential of philosophers, Homer Simpson: D'oh.

Steve Denham

BlueNose
Quartz | Level 8

Thank you both for your significant help !

I have a few questions if I may.

1. Steve, your code for Glimmix is wonderful, the slices are brilliant, it had divided the differences by Technique, however, in each technique, the control was "Control 1", and I want it to be the new treatment. So I did: slicedifftype=control('New') but got this error message: ERROR: Incorrect number of control levels for Treatment*Technique (what I mean is that for each technique I got 2 comparisons: new vs control 1 and control 2 vs control 1, while I want to compare all to the new treatment, I have no interest in control 2 vs control 1).

2. The graphics, weird, in Glimmix it worked like charm, without turning ODS on, can't understand why it gives me a grief in mixed.

3. I take your recommendation about reading the book. In the meanwhile, I am surprised, I didn't realize there was an issue with the AIC and BIC while using REML. Is it possible to explain in one line why there is a problem, or is it too complicated ? Does this issue appear in the book you recommended ? Regarding this issue, how do I choose the best model then, in a hypothetical case where I have plenty of fixed effects to choose from ? Is it valid for example to use ML so I can use AIC and BIC until I find my best model, and then with the chosen variables to run the model using REML to choose the random effects and get better estimates ? What about glimmix ? I ran Steve's code without specifying a method, so I used Restricted Maximum Likelihood. Does this problem exists also with the Laplace and Quad methods ?

Thank you !

SteveDenham
Jade | Level 19

Point 1 may have to be addressed by format.  It has to be an exact match, so that is the only thing I can offer.  The syntax you have looks as if it should work.

Point 2.  Not sure either, but the documentation is pretty clear that you need ODS GRAPHICS ON; for the plots in MIXED to work.

Point 3. REML corrects for the number of fixed effects fitted by projection, so different fixed model means a different projection, which means a different data set.  It all comes down to the residuals, which would be different depending on the fixed effects chosen.  Now, it is valid to do model selection under ML and then follow with a REML fit to select the covariance structure (at least there are a lot of folks out there, especially in R, doing this).  Now for GLIMMIX you can run into another problem.  If the distribution selected is such that the mean and variance are functionally related (meaning distributions other than normal, lognormal or t), both ML and REML are implemented through linearization and pseudo-likelihoods.  Consequently, GLIMMIX will not report IC's for distributions such as binomial if fit by these methods.  Laplace and quadrature methods generate quasi-likelihoods, and thus can generate IC's.  However, these methods do not allow for R side estimation.  I am sure somebody someplace (Doug Bates  or Ben Bolker maybe) is working on a way to get something that might reflect information criteria in this case, so watch the R mailing lists. 

And now comes my question--what do you mean by best model?  That's always a fun discussion.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

REML works with the fixed-effects residuals (by definition), therefore the AIC (etc.) is only quantifying the random effects in the model. That is, the fixed effects are eliminated before the random effects are fitted; then generalized least squares is used to estimate the fixed effects (using the random effects estimates from the REML step). REML is the default because it produces unbiased (or the least biased) estimates of parameters and their standard errors. ML could be used, but there is a bias problem, especially with small-to-moderate sample sizes. Quadrature and Laplace are ML-based methods; these are mostly relevant for non-normal distributions.

Why aren't you using Type 3 tests to determine the significance of main effects and interactions?

BlueNose
Quartz | Level 8

you asked me two very difficult questions....

Regarding the best model, I will separate it to the narrow view and broad view. In the narrow view, what is the purpose of the analysis ? The purpose is to estimate the differences between the 3 treatments (and their significance), taking into account the correlation due to multiple observations per subject. The best model is then the one who estimates the differences most accurately, so when I say that in the population, the new treatment is better by 5 units from the control, this is as close as possible to reality. In a broader view, the data in hand was somehow created, some mechanism created this data, and the model should explain this mechanism as closely as possible. So the best model is the one that is closer to the real mechanism of "nature" that created the data. Naturally, no model will be perfect, ever. Otherwise it wouldn't be a statistical model.

As far as I know GLIMMIX doesn't have Type 3 tests. As for mixed, I don't know to be honest. Are these tests robust to violation of model assumptions ?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Of course GLIMMIX and MIXED have type3 tests of hypotheses (also type 1). It is the default output.

BlueNose
Quartz | Level 8

If I am mainly interested in new treatment vs. control 1 and new treatment vs. control 2, does it make any sense NOT to make all possible comparisons but to do some sort of contrast, in order to increase the power of the test to detect significant differences ? Is it possible to tell SAS to compare the new treatment to the two controls (at each technique) but not to compare the controls to one another and by that to reduce the number of tests ?

JMP also reports R2 and adjusted R2, does it have a meaning in a mixed model (SAS doesn't report this). If it is reliable, than the model fit doesn't look too bad.

If SAS gave me covariance parameter estimates of 1.1 for subjectID and 1.2 for the residual, how would you interpret it verbally (my units are minutes, but these should be variances if I am not mistaken) ?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

R2 has no unique meaning with multiple random effects.There have been several discussions of this on this community website. Many people are frustrated that there is no simple measure of goodness of fit for mixed models.

Regarding your two estimated variances, you can say that variability is of the same order of magnitude between and within subjects.

You use individual contrasts to compare means. I suggest you read the User's Guide for LSMESTIMATE to learn how to use this (with GLIMMIX). You will need to read about the general syntax of the statement before we can give specific help. Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 3216 views
  • 0 likes
  • 3 in conversation