BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kastchei
Pyrite | Level 9

Hello!

 
I have a experiment where triplicate* measurements are taken for each subject twice, either
  • before, and after 4 hours, or
  • before, and after 24 hours.
Subjects are randomized by the numbers of hours after (4 vs. 24).  Should I set up my fixed effects as:
 
1) two fixed effects, each with 2 levels
  • (before) vs. (after)
  • (4 h) vs. (24 h)
 
2) one fixed effect with 3 levels
  • (before) vs. (after 4 h) vs. (after 24 h)?
 
Thanks!
 
*More detail.  Each measurement is taken in triplicate, which is why I can't do something as simple as a paired t test of 24 v 4 hours.  The measurements are also log-normal, so it's not really correct to take an average of the triplicates and then the difference of the averages, although I supposed I could do that on the log scale.  I've been using mixed models, and I'm just stuck on whether to view this as (a study with 2 time points, an additional fixed effect, and no missing data) or (a study with 3 times points and missing data for whichever after-timepoint not randomized to).  I'm leaning towards the former since the "missing data" is not at random, but by design.  Also, it would allow a before/after comparison for each of 4h and 24h.  The second design would compare the after of each to the combination of all the before measurements which might somewhat dilute any significant findings.
1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If the triplicate samples are just subsamples, then the simplest approach would be to combine the 3 values into a single value in some sensible way. For example, if the measurement Y follows the lognormal distribution, then the mean of the log-transformed measurements seems reasonable, as you've considered.

 

I can think of two ways to set up a model, one as an ANCOVA and another as a split-plot ANOVA. They differ in the way that time (before and after) is incorporated. The split-plot is your "2 fixed effects" approach.

 

For an ANCOVA approach, SUBJECTID is a random effects factor that is randomly assigned to a level of a fixed effects factor EXTENT that has two levels: 4 h and 24 h. The BEFORE measurement value is a fixed effects covariate. The AFTER measurement value is the response. The BEFORE and AFTER values might be means of log-transformed triplicate values for each SUBJECTID. Class and model statements might look something like  

 

   class extent;

   model after = before extent before*extent;

   lsmeans extent / diff;

 

assuming a linear relationship between AFTER and BEFORE. By default, the lsmeans for EXTENT=4 and EXTENT=24 are estimated at the overall mean value for BEFORE, and the comparison of these two means assesses the effect of EXTENT conditional on this common value of BEFORE. A significant interaction of BEFORE and EXTENT would suggest that the regression of AFTER on BEFORE is not the same for subjects measured after 4h compared to subjects measured after 24h; comparing the main effect EXTENT means may not be sensible in the presence of significant interaction.

 

For a split-plot approach, SUBJECTID is a random effects factor—the “whole plot unit”—that is randomly assigned to a level of a fixed effects factor EXTENT—the “whole plot factor”--that has two levels: 4 h and 24 h. Two repeated measures—the “subplot units”—are made on each SUBJECTID, and the “subplot factor” is PERIOD with two levels: before and after. The response is the measurement value Y, which could be on a transformed scale. Class, model and random statements might look like

 

   class extent period subjectid;

   model y = extent period extent*period;

   random intercept / subject=subjectid(extent) ;

 

The interaction of EXTENT and PERIOD assesses whether the mean difference (=after-before) is the same for subjects measured after 4 h compared to subjects measured after 24 h.

 

My sense is that general consensus prefers the ANCOVA over the split-plot, but that’s possibly arguable for any particular scenario.

 

If you don’t combine the triplicate values, then I don’t see how the ANCOVA approach can be applied unless there are 3 pairs of before and after values. But you could use the split-plot approach by adding

 

   random period / subject=subjectid(extent) type=cs;  /* or type=csh or type=un */

 

which is in the same spirit as the direction that Steve has taken in his response.

 

All of this is, of course, assuming that I understand your design correctly.

 

HTH,

Susan

 

View solution in original post

6 REPLIES 6
SteveDenham
Jade | Level 19

I see two things going on here--you have repeated measures on all respondents, but the post-treatment measures are not at both time points for the subjects.  You probably ought to capture the repeated nature somehow.  Here is my suggestion:

 

proc glimmix data=have;

class time rep subjectid;

model response=time/dist=lognormal;

random time/residual type=un subject=subjectid;

random rep/residual subject=subjectid; /* To get this to work, this may have to drop the residual option */

lsmeans time/diff;

run;

 

This accommodates the repeated nature of the observations within subject, both for time and for the triplicate measurements.  Note that the lsmeans will be on the log scale, but can be back transformed (not just exponentiated).

 

Steve Denham

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

If the triplicate samples are just subsamples, then the simplest approach would be to combine the 3 values into a single value in some sensible way. For example, if the measurement Y follows the lognormal distribution, then the mean of the log-transformed measurements seems reasonable, as you've considered.

 

I can think of two ways to set up a model, one as an ANCOVA and another as a split-plot ANOVA. They differ in the way that time (before and after) is incorporated. The split-plot is your "2 fixed effects" approach.

 

For an ANCOVA approach, SUBJECTID is a random effects factor that is randomly assigned to a level of a fixed effects factor EXTENT that has two levels: 4 h and 24 h. The BEFORE measurement value is a fixed effects covariate. The AFTER measurement value is the response. The BEFORE and AFTER values might be means of log-transformed triplicate values for each SUBJECTID. Class and model statements might look something like  

 

   class extent;

   model after = before extent before*extent;

   lsmeans extent / diff;

 

assuming a linear relationship between AFTER and BEFORE. By default, the lsmeans for EXTENT=4 and EXTENT=24 are estimated at the overall mean value for BEFORE, and the comparison of these two means assesses the effect of EXTENT conditional on this common value of BEFORE. A significant interaction of BEFORE and EXTENT would suggest that the regression of AFTER on BEFORE is not the same for subjects measured after 4h compared to subjects measured after 24h; comparing the main effect EXTENT means may not be sensible in the presence of significant interaction.

 

For a split-plot approach, SUBJECTID is a random effects factor—the “whole plot unit”—that is randomly assigned to a level of a fixed effects factor EXTENT—the “whole plot factor”--that has two levels: 4 h and 24 h. Two repeated measures—the “subplot units”—are made on each SUBJECTID, and the “subplot factor” is PERIOD with two levels: before and after. The response is the measurement value Y, which could be on a transformed scale. Class, model and random statements might look like

 

   class extent period subjectid;

   model y = extent period extent*period;

   random intercept / subject=subjectid(extent) ;

 

The interaction of EXTENT and PERIOD assesses whether the mean difference (=after-before) is the same for subjects measured after 4 h compared to subjects measured after 24 h.

 

My sense is that general consensus prefers the ANCOVA over the split-plot, but that’s possibly arguable for any particular scenario.

 

If you don’t combine the triplicate values, then I don’t see how the ANCOVA approach can be applied unless there are 3 pairs of before and after values. But you could use the split-plot approach by adding

 

   random period / subject=subjectid(extent) type=cs;  /* or type=csh or type=un */

 

which is in the same spirit as the direction that Steve has taken in his response.

 

All of this is, of course, assuming that I understand your design correctly.

 

HTH,

Susan

 

SteveDenham
Jade | Level 19

Great point, Susan, on the pairs of pre -  post observations.  That is critical to my assumption of "rep" in the model--that each pre-test value is associated with a unique post-test value.

 

If this association can't be assured, then only the aggregate value has meaning.  For lognormal data, the geometric mean of the triplicates ought to be the response variable.

 

Steve Denham

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Steve, thanks for linking my "mean of the logs" to the geometric mean. Your link provides a nice intuitive reminder of what is being estimated when data are log transformed and re-transformed.

 

Susan

Kastchei
Pyrite | Level 9

Thanks Susan and Steve.  You both make great points.

 

1. I guess I have been hesitant to combine the triplicates using a mean of the logs, because the values have a lot of variation.  For background, each replicate is a piece of a biopsy that is being infected with a virus.  Some replicates come out with infection levels in the hundreds (essentially not detectable), while others come up with levels of several millions.  I figured that inputting the individual replicate measurements rather than a mean would allow the model to compensate for variation between replicates.  Do you think this is a valid concern?

 

2. For the split-plot model, Susan suggested:

 

class extent period subjectid;

model y = extent period extent*period;

random intercept / subject=subjectid(extent) ;

random period / subject=subjectid(extent) type=cs;  /* or type=csh or type=un */

 

 

I had already done something similar, just combining the two random statements.  Would they be similar?  Do I need the (extent) in the subject= since each subject can only have one extent?

random intercept period / subject=subjectid type = cs;

 

 

3. On another forum, two people suggested that I treat time as one effect (baseline, 4h and 24h).  Neither of you suggested this method.  I tend to agree with you, but I was wondering what your rationale against this approach would be.

 

Thanks,

Michael

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Hi Michael,

 

1. We might need more detail about the study. Meanwhile I’ll assume that the subjects are the replicating factor: that if you wanted more information about the effect of 4h versus 24h, you would add more subjects to each group. And I’ll assume that the triplicate samples are subsamples: that you have taken 3 measurements on each biopsy because the tissue is heterogeneous and you need multiple measurements to adequately characterize the biopsy as a whole. With heterogeneous tissue, the triplicates provide better quality information than would a single subsample, but in a design sense, the triplicates are experiencing the same application of the level of the extent factor because they are clustered within a biopsy (taken from a particular subject at a particular extent level). The assessment of the effect of extent depends upon the variance among subjects, not the variance among triplicates.

 

For (possibly transformed) data that follow a normal distribution and for a balanced design (e.g., 3 subsamples for each subject at each extent level), you get exactly the same test statistics (meaning tests for extent and period for the split-plot) for a model that uses data measured at the triplicate level and incorporates an additional random factor (that estimates variance among triplicates) as for a model that uses the mean of triplicates as data and has a simpler random structure (i.e., no term for variance among triplicates). If the number of subsamples is unbalanced, then the test results will not be exactly the same; in the unbalanced scenario, the more complex triplicate-level-data model would be preferable, although in practice I’ve found it usually doesn’t matter. Depends to some degree on how unbalanced the study is.

 

So if you were to run the two approaches (triplicate-level-data model and mean-level-data model), you should get the same answers. Notably, although you don’t get any advantage in denominator degrees of freedom from the triplicates for the test of extent, the quality of the estimates will be improved because the triplicates provide a better estimate of the “true” value of the biopsy compared to a single subsample.

 

As an aside, if you have tremendous variability among subsamples, as it seems you might, then you might consider even more subsamples and/or more subjects, or maybe there is some sensible criteria to select among subsamples (e.g., the subsample with the highest infection level). Or you could categorize each subsample as having detectable infection versus no detectable infection, which would be a binomial response: number of detectable infection subsamples / total number of subsamples, and you would be modeling probability of detectable infection. Clearly this sort of thing depends upon resources and biological context. The number of subsamples and subjects could be explored with power analysis.

 

2. Yes, the two random statements can be combined, as long as both use the same type=<whatever>. If subjects are numbered uniquely—for example, if you have 3 subjects at 4h coded as 1,2,3 and 3 subjects at 24h coded 4,5,6—then you do not need subjectid(extent). I usually include it because it tends to produce the denominator degrees of freedom that I want, compared to just subjectid. And I’ve been told (and it probably is in the documentation) that the SAS mixed model procedures are more efficient if subjectid is coded 1,2,3 within each extent group; then you have to use subjectid(extent) to uniquely identify all 6 subjects.

 

3. They are misguided?  🙂  I saw those posts before I got around to responding to this one, but couldn’t find them again and I can’t remember what they said in any detail. 

 

If you were to use time as a 3-level fixed effects factor, then each subject would be an incomplete block for time. In a simple IBD where each block contains two units to which a level of time is randomly assigned, you would have some blocks with (base, 4h), some with (base, 24h), and some with (4h, 24h). That said, you could do an IBD variant that always includes base in the block, but now you don’t have any subjects with (4h, 24h) and your ability to compare 4h to 24h is impeded because those levels never occur together. And it’s all starting to get kind of manic. The split plot approach is so much more lovely. The two-level vision of time also permits an ANCOVA approach, if one were inclined.

 

Hope this helps,

Susan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2098 views
  • 7 likes
  • 3 in conversation