a month ago
My use of a mixed effects model has been queried for a repeated measures analysis. I am working in SAS 9.4. In my experimental set up I have a randomised block design, with 5 blocks each containing 1 replicate of 3 levels of a treatment (5 * 3 , n= 15)- variable name "TREAT". Measurements of my reponse variable (LOG_N2O) were taken on 75 occasions (CYCLETIME), hence the need for a repeated measures analysis, but since there are missing data from nearly every measurement time, ANOVA is not suitable. The code I have used is below
PROC MIXED DATA= AUTOSTATS;
CLASS BLOCK TREAT CYCLETIME;
MODEL LOG_N2O= TREAT CYCLETIME BLOCK
REPEATED CYCLETIME / SUBJECT= CHANNEL TYPE= AR(1);
RANDOM CHANNEL BLOCK;
LSMEANS TREAT CYCLETIME BLOCK /* POST HOC TEST FOR IDENTIFYING THE DIFFERENCES */
/ ADJUST= TUKEY ;
ODS EXCLUDE CONVERGENCESTATUS; /*SUPPRESS THE PRINTING AS IT TAKES UP MEMORY*/
ODS EXCLUDE COVPARMS;
ODS EXCLUDE DIMENSIONS;
ODS EXCLUDE FITSTATISTICS;
ODS EXCLUDE ITERHISTORY;
ODS EXCLUDE LRT;
ODS EXCLUDE MODELINFO;
ODS EXCLUDE NOOBS;
ODS OUTPUT DIFFS= MEANTREAT TESTS3= N2O_FSTATS;
This gives me output that is understandable and makes sense to me...
|Class Level Information|
|BLOCK||5||1 2 3 4 5|
|TREAT||3||FER NH4 NO3|
|CYCLETIME||75||01APR2014:03:41:50 01APR2014:09:34:55 01APR2014:23:12:41 02APR2014:12:34:39 02APR2014:16:42:48 02APR2014:20:51:37 03APR2014:00:59:42 03APR2014:05:08:41 03APR2014:09:17:52 03APR2014:13:26:31 03APR2014:17:36:16 03APR2014:21:44:46 04APR2014:01:52:23 04APR2014:05:59:54 04APR2014:10:09:08 04APR2014:16:07:16 04APR2014:20:14:46 05APR2014:00:23:31 05APR2014:04:33:54 05APR2014:08:42:11 05APR2014:12:49:13 05APR2014:16:55:41 05APR2014:21:02:37 06APR2014:01:10:53 06APR2014:05:20:49 06APR2014:12:56:13 07APR2014:15:17:47 09APR2014:12:27:32 09APR2014:16:36:09 09APR2014:20:47:03 10APR2014:00:59:10 10APR2014:05:10:27 10APR2014:09:19:44 10APR2014:20:50:00 11APR2014:01:02:00 11APR2014:18:46:00 11APR2014:22:57:00 12APR2014:03:08:00 12APR2014:07:17:00 12APR2014:11:26:00 12APR2014:15:34:00 13APR2014:16:09:00 13APR2014:20:19:00 14APR2014:00:28:00 24MAR2014:17:41:36 24MAR2014:22:03:01 25MAR2014:02:38:52 25MAR2014:07:14:45 25MAR2014:15:52:05 25MAR2014:20:13:41 26MAR2014:00:49:28 26MAR2014:05:25:17 26MAR2014:10:01:16 26MAR2014:16:41:00 26MAR2014:21:16:41 27MAR2014:01:52:21 27MAR2014:06:27:58 27MAR2014:15:40:36 27MAR2014:20:02:04 28MAR2014:00:37:53 28MAR2014:05:13:38 28MAR2014:09:49:36 28MAR2014:14:25:30 29MAR2014:19:23:01 29MAR2014:23:57:56 30MAR2014:04:32:23 30MAR2014:09:07:49 30MAR2014:13:43:31 30MAR2014:18:18:59 30MAR2014:22:53:40 31MAR2014:03:28:38 31MAR2014:08:04:00 31MAR2014:13:32:48 31MAR2014:18:26:10 31MAR2014:23:04:18|
|Number of Observations|
|Number of Observations Read||1110|
|Number of Observations Used||794|
|Number of Observations Not Used||316|
|Type 3 Tests of Fixed Effects|
|Effect||Num DF||Den DF||F Value||Pr > F|
However, the degrees of freedom provided in the SAS output have been questioned and I need to be certain of my analysis as it is for publication. To solve the issue, I then went to SPSS to run the analysis there.... and opened another can of worms, since the output of the model is so different, in that the F value for the effect of variable "TREAT" is much much lower in SPSS, the degrees of freedom are also much much lower and hence the p value now non-sig. It's a long time since I used SPSS so I am less inclined to trust my coding in that platform. My question is whether the model I have coded above is a valid way to analyse my data, and whether the degrees of freedom are correct (why they are what they are, so that I can justify my analysis)
a month ago
a month ago - last edited a month ago
What is CHANNEL, and what role does it play in your study? The term is in your model but not in the description of your study (which otherwise is quite helpful).
What is it that you want to know about the effect of CYCLETIME? 75 levels is a lot to sort out in an ANOVA model. (Your current code specifies a mixed-model ANOVA.)
a month ago
CHANNEL is the unique id of each replicate. It is a numeric label.
My interest is primarily whether there is a treatment effect (TREAT), but also whether that changes with time (CYCLETIME) and if gthere is an interaction between the two.
a month ago
So you have 15 "replicates", 3 in each of 5 blocks?
If you had balanced data, then the interaction of TREAT and CYCLETIME would have 3 x 75 = 225 means. How would you go about interpreting interaction in this scenario? Would it be more parsimonious to regress the response on CYCLETIME and assess whether the 3 regressions are parallel across CYCLETIME?
a month ago
Good for your critic. You should definitely question the correctness of the model when it reports zero denominator degrees of freedom, no matter how much you like the results. Generally zero den df means that your model is over-specified.
And, in fact, your model is overspecified: you have BLOCK in both MODEL and RANDOM statements, which is not correct--you are asking the model to estimate the same parameters twice which makes the model unhappy. Theoretically, if BLOCK is a random effects factor, BLOCK and all interactions involving BLOCK are random effects factors. Consequently those terms generally should not be in the MODEL statement.
If you have at least one observation in each combination of TREAT and CYCLETIME, then you could consider this model:
proc glimmix data=autostats; class block treat cycletime; model log_N2O = treat|cycletime; random block; *random block*treat; /* might be included for type=ar(1) */ repeated cycletime / subject=block*treat type=ar(1);
lsmeans treat|cycletime; run;
Note that "channel" in your model is (I think) equivalent to BLOCK*TREAT. You don't need to use CHANNEL--BLOCK*TREAT works just fine--but if you do, it should be in the CLASS statement.
If the TREAT x CYCLETIME factorial is incomplete and if TREAT*CYCLETIME is included in the treatment structure of the model, then various lsmeans will be reported as "non-est" (not estimable) and the interpretation of the interaction may be (probably is) suspect.
A fixed effects factor with 75 levels (like CYCLETIME) is very unwieldy in an ANOVA model. If it is significant, how do you sort out which means are different from which other means? The number of pairwise comparisons is 2,775 (the combination of 75 things taken 2 at a time). 2,775! Consider the impact on power of controlling for family-wise Type I error among 2,775 comparisons using the Tukey method. Plus I can't imagine that you actually care about each of the 75 means. Plus the interaction is even more unwieldy.
So I suggest giving serious consideration to dealing with CYCLETIME in some other way that makes sense in the context of your data. You could, for example, compute weekly averages (if that was defensible in context) and continue with a mixed-model ANOVA approach. Another approach is a regression of the response on CYCLETIME, maybe using spline regression rather than linear or curvilinear regression if the linearity assumption is not met. A regression approach would also accommodate your missing data problem. These sorts of models are very complex and definitely should not be pursued naively. (I've added italics for emphasis. You really need to know what you are doing with these models.)
To use CYCLETIME as the predictor variable in regression, you would need to convert it from a text variable to a numeric variable.
A few resources related to a regression approach: