turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Mixed model: 2 fixed effects vs. 1 combined

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-18-2015 10:29 AM - edited 09-18-2015 10:32 AM

Hello!

I have a experiment where triplicate* measurements are taken for each subject twice, either

- before, and after 4 hours, or
- before, and after 24 hours.

1) two fixed effects, each with 2 levels

- (before) vs. (after)
- (4 h) vs. (24 h)

2) one fixed effect with 3 levels

- (before) vs. (after 4 h) vs. (after 24 h)?

Thanks!

*More detail. Each measurement is taken in triplicate, which is why I can't do something as simple as a paired t test of 24 v 4 hours. The measurements are also log-normal, so it's not really correct to take an average of the triplicates and then the difference of the averages, although I supposed I could do that on the log scale. I've been using mixed models, and I'm just stuck on whether to view this as (a study with 2 time points, an additional fixed effect, and no missing data) or (a study with 3 times points and missing data for whichever after-timepoint not randomized to). I'm leaning towards the former since the "missing data" is not at random, but by design. Also, it would allow a before/after comparison for each of 4h and 24h. The second design would compare the after of each to the combination of all the before measurements which might somewhat dilute any significant findings.

Accepted Solutions

Solution

09-25-2015
10:57 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-24-2015 01:09 AM

If the triplicate samples are just subsamples, then the simplest approach would be to combine the 3 values into a single value in some sensible way. For example, if the measurement Y follows the lognormal distribution, then the mean of the log-transformed measurements seems reasonable, as you've considered.

I can think of two ways to set up a model, one as an ANCOVA and another as a split-plot ANOVA. They differ in the way that time (before and after) is incorporated. The split-plot is your "2 fixed effects" approach.

For an ANCOVA approach, SUBJECTID is a random effects factor that is randomly assigned to a level of a fixed effects factor EXTENT that has two levels: 4 h and 24 h. The BEFORE measurement value is a fixed effects covariate. The AFTER measurement value is the response. The BEFORE and AFTER values might be means of log-transformed triplicate values for each SUBJECTID. Class and model statements might look something like

class extent;

model after = before extent before*extent;

lsmeans extent / diff;

assuming a linear relationship between AFTER and BEFORE. By default, the lsmeans for EXTENT=4 and EXTENT=24 are estimated at the overall mean value for BEFORE, and the comparison of these two means assesses the effect of EXTENT conditional on this common value of BEFORE. A significant interaction of BEFORE and EXTENT would suggest that the regression of AFTER on BEFORE is not the same for subjects measured after 4h compared to subjects measured after 24h; comparing the main effect EXTENT means may not be sensible in the presence of significant interaction.

For a split-plot approach, SUBJECTID is a random effects factor—the “whole plot unit”—that is randomly assigned to a level of a fixed effects factor EXTENT—the “whole plot factor”--that has two levels: 4 h and 24 h. Two repeated measures—the “subplot units”—are made on each SUBJECTID, and the “subplot factor” is PERIOD with two levels: before and after. The response is the measurement value Y, which could be on a transformed scale. Class, model and random statements might look like

class extent period subjectid;

model y = extent period extent*period;

random intercept / subject=subjectid(extent) ;

The interaction of EXTENT and PERIOD assesses whether the mean difference (=after-before) is the same for subjects measured after 4 h compared to subjects measured after 24 h.

My sense is that general consensus prefers the ANCOVA over the split-plot, but that’s possibly arguable for any particular scenario.

If you don’t combine the triplicate values, then I don’t see how the ANCOVA approach can be applied unless there are 3 *pairs* of before and after values. But you could use the split-plot approach by adding

random period / subject=subjectid(extent) type=cs; /* or type=csh or type=un */

which is in the same spirit as the direction that Steve has taken in his response.

All of this is, of course, assuming that I understand your design correctly.

HTH,

Susan

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-22-2015 08:26 AM

I see two things going on here--you have repeated measures on all respondents, but the post-treatment measures are not at both time points for the subjects. You probably ought to capture the repeated nature somehow. Here is my suggestion:

proc glimmix data=have;

class time rep subjectid;

model response=time/dist=lognormal;

random time/residual type=un subject=subjectid;

random rep/residual subject=subjectid; /* To get this to work, this may have to drop the residual option */

lsmeans time/diff;

run;

This accommodates the repeated nature of the observations within subject, both for time and for the triplicate measurements. Note that the lsmeans will be on the log scale, but can be back transformed (not just exponentiated).

Steve Denham

Solution

09-25-2015
10:57 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-24-2015 01:09 AM

If the triplicate samples are just subsamples, then the simplest approach would be to combine the 3 values into a single value in some sensible way. For example, if the measurement Y follows the lognormal distribution, then the mean of the log-transformed measurements seems reasonable, as you've considered.

I can think of two ways to set up a model, one as an ANCOVA and another as a split-plot ANOVA. They differ in the way that time (before and after) is incorporated. The split-plot is your "2 fixed effects" approach.

For an ANCOVA approach, SUBJECTID is a random effects factor that is randomly assigned to a level of a fixed effects factor EXTENT that has two levels: 4 h and 24 h. The BEFORE measurement value is a fixed effects covariate. The AFTER measurement value is the response. The BEFORE and AFTER values might be means of log-transformed triplicate values for each SUBJECTID. Class and model statements might look something like

class extent;

model after = before extent before*extent;

lsmeans extent / diff;

assuming a linear relationship between AFTER and BEFORE. By default, the lsmeans for EXTENT=4 and EXTENT=24 are estimated at the overall mean value for BEFORE, and the comparison of these two means assesses the effect of EXTENT conditional on this common value of BEFORE. A significant interaction of BEFORE and EXTENT would suggest that the regression of AFTER on BEFORE is not the same for subjects measured after 4h compared to subjects measured after 24h; comparing the main effect EXTENT means may not be sensible in the presence of significant interaction.

For a split-plot approach, SUBJECTID is a random effects factor—the “whole plot unit”—that is randomly assigned to a level of a fixed effects factor EXTENT—the “whole plot factor”--that has two levels: 4 h and 24 h. Two repeated measures—the “subplot units”—are made on each SUBJECTID, and the “subplot factor” is PERIOD with two levels: before and after. The response is the measurement value Y, which could be on a transformed scale. Class, model and random statements might look like

class extent period subjectid;

model y = extent period extent*period;

random intercept / subject=subjectid(extent) ;

The interaction of EXTENT and PERIOD assesses whether the mean difference (=after-before) is the same for subjects measured after 4 h compared to subjects measured after 24 h.

My sense is that general consensus prefers the ANCOVA over the split-plot, but that’s possibly arguable for any particular scenario.

If you don’t combine the triplicate values, then I don’t see how the ANCOVA approach can be applied unless there are 3 *pairs* of before and after values. But you could use the split-plot approach by adding

random period / subject=subjectid(extent) type=cs; /* or type=csh or type=un */

which is in the same spirit as the direction that Steve has taken in his response.

All of this is, of course, assuming that I understand your design correctly.

HTH,

Susan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-25-2015 10:43 AM

Great point, Susan, on the pairs of pre - post observations. That is critical to my assumption of "rep" in the model--that each pre-test value is associated with a unique post-test value.

If this association can't be assured, then only the aggregate value has meaning. For lognormal data, the geometric mean of the triplicates ought to be the response variable.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-26-2015 02:07 PM

Steve, thanks for linking my "mean of the logs" to the geometric mean. Your link provides a nice intuitive reminder of what is being estimated when data are log transformed and re-transformed.

Susan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-25-2015 01:00 PM - edited 09-25-2015 02:26 PM

Thanks Susan and Steve. You both make great points.

1. I guess I have been hesitant to combine the triplicates using a mean of the logs, because the values have a lot of variation. For background, each replicate is a piece of a biopsy that is being infected with a virus. Some replicates come out with infection levels in the hundreds (essentially not detectable), while others come up with levels of several millions. I figured that inputting the individual replicate measurements rather than a mean would allow the model to compensate for variation between replicates. Do you think this is a valid concern?

2. For the split-plot model, Susan suggested:

class extent period subjectid;

model y = extent period extent*period;

random intercept / subject=subjectid(extent) ;

random period / subject=subjectid(extent) type=cs; /* or type=csh or type=un */

I had already done something similar, just combining the two random statements. Would they be similar? Do I need the (extent) in the subject= since each subject can only have one extent?

random intercept period / subject=subjectid type = cs;

3. On another forum, two people suggested that I treat time as one effect (baseline, 4h and 24h). Neither of you suggested this method. I tend to agree with you, but I was wondering what your rationale against this approach would be.

Thanks,

Michael

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-26-2015 02:02 PM

Hi Michael,

1. We might need more detail about the study. Meanwhile I’ll assume that the subjects are the replicating factor: that if you wanted more information about the effect of 4h versus 24h, you would add more subjects to each group. And I’ll assume that the triplicate samples are subsamples: that you have taken 3 measurements on each biopsy because the tissue is heterogeneous and you need multiple measurements to adequately characterize the biopsy as a whole. With heterogeneous tissue, the triplicates provide better quality information than would a single subsample, but in a design sense, the triplicates are experiencing the same application of the level of the *extent* factor because they are clustered within a biopsy (taken from a particular subject at a particular *extent* level). The assessment of the effect of *extent* depends upon the variance among subjects, not the variance among triplicates.

For (possibly transformed) data that follow a normal distribution and for a balanced design (e.g., 3 subsamples for each subject at each *extent* level), you get exactly the same test statistics (meaning tests for *extent* and *period* for the split-plot) for a model that uses data measured at the triplicate level and incorporates an additional random factor (that estimates variance among triplicates) as for a model that uses the mean of triplicates as data and has a simpler random structure (i.e., no term for variance among triplicates). If the number of subsamples is unbalanced, then the test results will not be exactly the same; in the unbalanced scenario, the more complex triplicate-level-data model would be preferable, although in practice I’ve found it usually doesn’t matter. Depends to some degree on how unbalanced the study is.

So if you were to run the two approaches (triplicate-level-data model and mean-level-data model), you should get the same answers. Notably, although you don’t get any advantage in denominator degrees of freedom from the triplicates for the test of extent, the quality of the estimates will be improved because the triplicates provide a better estimate of the “true” value of the biopsy compared to a single subsample.

As an aside, if you have tremendous variability among subsamples, as it seems you might, then you might consider even more subsamples and/or more subjects, or maybe there is some sensible criteria to select among subsamples (e.g., the subsample with the highest infection level). Or you could categorize each subsample as having detectable infection versus no detectable infection, which would be a binomial response: number of detectable infection subsamples / total number of subsamples, and you would be modeling probability of detectable infection. Clearly this sort of thing depends upon resources and biological context. The number of subsamples and subjects could be explored with power analysis.

2. Yes, the two random statements can be combined, as long as both use the same *type=*<whatever>. If subjects are numbered uniquely—for example, if you have 3 subjects at 4h coded as 1,2,3 and 3 subjects at 24h coded 4,5,6—then you do not need *subjectid(extent)*. I usually include it because it tends to produce the denominator degrees of freedom that I want, compared to just *subjectid*. And I’ve been told (and it probably is in the documentation) that the SAS mixed model procedures are more efficient if *subjectid* is coded 1,2,3 within each *extent* group; then you **have** to use *subjectid(extent)* to uniquely identify all 6 subjects.

3. They are misguided? :-) I saw those posts before I got around to responding to this one, but couldn’t find them again and I can’t remember what they said in any detail.

If you were to use *time* as a 3-level fixed effects factor, then each subject would be an incomplete block for *time*. In a simple IBD where each block contains two units to which a level of *time* is randomly assigned, you would have some blocks with (base, 4h), some with (base, 24h), and some with (4h, 24h). That said, you could do an IBD variant that always includes base in the block, but now you don’t have any subjects with (4h, 24h) and your ability to compare 4h to 24h is impeded because those levels never occur together. And it’s all starting to get kind of manic. The split plot approach is so much more lovely. The two-level vision of time also permits an ANCOVA approach, if one were inclined.

Hope this helps,

Susan