Hello everyone,
I have made a short pilot study with a poor research design... The overall scope of this study included 6 trials, where we tested 6 strains (from A to F) with two different doses (1 and 2) in chicken embryos. The response is the embryonic mortality observed.
In each trial the strains and number of subject used were different. A negative control group (concentration = 0, strain= 0) and positive control group (strain A, with 2 concentrations) per trial was always included.
The input is like:
Trial Dose Strain N Response
I 1 A 100 50
I 2 A 100 45
I 1 B 100 30
I 2 B 100 46
I 0 0 200 1
II 1 A 100 40
II 2 A 100 39
II 1 C 100 10
II 2 C 100 15
II 1 D 100 0
II 2 D 100 3
II 0 0 100 0
III 1 A 200 50
III 2 A 200 62
III 1 E 200 12
III 2 E 200 18
III 0 0 200 1
IV 1 A 100 39
IV 2 A 100 51
IV 1 B 100 49
IV 2 B 100 44
IV 1 F 100 1
IV 2 F 100 4
IV 0 0 200 1
V 1 A 100 66
V 1 C 100 18
V 1 E 100 22
V 0 0 200 2
VI 1 A 300 61
VI 2 A 300 58
VI 1 A 300 3
VI 1 C 300 10
VI 2 C 300 19
VI 0 0 300 0
Although I know that it is not met, my H0 is that there are not differences between the embryos infected with different doses and strains. I think PROC GLIMMIX is the best option, but I do not know how I can include the random effects and its interactions in the program.
I have created the program as following:
proc glimmix data=expc;
class trial strain dose;
model summe/n = strain*dose/dist=binomial htype=1,3 link=logit ddfm=satterth;
lsmeans strain*dose/ ilink diff lines;
random strain(trial) dose(trial) /solution;
run;
But I am not sure if the program is right…
In addition, I have read that COVTEST statement makes inferences about the covariance parameters, and the ESTIMATE statement compares the effect between groups. Should I use these statements? I do not understand how the values in these statements are allocated….
Thank you very much in advance!
Hi Ana,
I think you have identified why that comparison doesn't show up--it would require some sort of discontinuous line to be compatible with all of the other comparisons.
I would suggest adjust=simulate rather than adjust=tukey. For long reasons why, see Westfall, Tobias and Wolfinger, Multiple Comparisons and Multiple Tests Using SAS, 2nd edition. (2011). For short reasons, it does a better job of protecting the overall type I error rate.
Steve Denham
If I understand your design correctly, you might have good luck with the following approach:
proc glimmix data=expc;
class trial strain dose;
model summe/n = strain*dose/dist=binomial link=logit ddfm=satterth;
lsmeans strain*dose/ ilink diff lines;
random intercept /subject=trial solution;
run;
You could then compare this to:
proc glimmix data=expc;
class trial strain dose;
model summe/n = strain*dose/dist=binomial link=logit ddfm=satterth;
lsmeans strain*dose/ ilink diff lines;
random intercept /subject=trial group=strain*dose solution;
run;
where the latter fits separate variance components due to trial for each strain by dose combination. Note that with a Satterthwaite degrees of freedom, this latter model might result in some odd standard errors.
Steve Denham
Dear Steve,
thank you very much for your great help.
The first model shows a Gener. Chi-Square / DF =1.86, the covariance of the intercept is 0.7072, and the Type III Tests shows significant effect (P<0.0001). In addition, the LSM of the strain*dose are over-estimated.
The second model shows a Gener. Chi-Square / DF =0.97 and the Type III Tests shows also significant effect (P<0.0007). In this case, the LSM seem properly estimated.
Therefore, I would conclude that this latter model better suit to my data, it is not?
Regarding the Satterthwaite option, I do not observe any odd standard error...
One thing that I do not understand very well are the obtained differences between the strains for each dose. The following message is showed:
The LINES display does not reflect all significant comparisons. The following additional pairs are significantly different |
---|
However, in the Table "Conservative T Grouping for Dose*Strain Least Squares Means" appear some pairs significantly different from the message with the same letter (for example the control group (dose=0), which mortality is 12.34%, with a high pathogenic strain, whose pathogenic degree is 86,10 %). How can I interpret properly these results?
Thank you very much in advance.
Best regards,
Ana
Hi Ana,
Yes, it appears that the second model is superior to the first. As far as the messages, I am not familiar with any of them. Could you please post your output where these occur?
Steve Denham
Hi Steve,
Thanks for your very much for your answer and interest!
Please, find attached the output of the second model. The message appears at the end of the last table (at the end of the document).
If you would like also to look the output of the first model, I will be happy to post it.
I appreciate your assistance in this matter!
Regards,
Ana
Hi Ana,
Well, maybe this model isn't the best, as it appears that several of the variance components are going to zero with no associated standard error. This gives rise to the 'G matrix is not positive definite' statement. You just don't have enough data to adequately model all of those variance components. The safer course then is to not use the group= option.
Now as far as the LINES, I think you have just run into a presentation limitation of the GLIMMIX output. A possible way to capture all of the data would be to use an ODS OUTPUT statement. I would now suggest the following:
proc glimmix data=expc;
class trial strain dose;
model summe/n = strain*dose/dist=binomial link=logit ddfm=satterth;
lsmeans strain*dose/ ilink diff lines;
random intercept /subject=trial solution;
ODS OUTPUT LSMLINES=LSMLINES;
run;
From the LSMlines dataset, you should be able to extract all that you need.
Steve Denham
Dear Sven,
Thank you for the observation.
Your suggestion works, but the message about the lines appears again.
I have thought of the option ADJUST in the LSMEANS statement since determines the method for multiple comparison adjustment of LS-mean differences. And I tried the method ADJUST=TUKEY because my data are unbalanced.
proc glimmix data=expc;
class trial strain dose;
model summe/n = strain*dose/dist=binomial link=logit ddfm=satterth;
lsmeans strain*dose/ ilink diff lines adjust=TUKEY;
random intercept /subject=trial solution;
run;
I obtain four significant comparison levels (vs. six with your procedure) and, although the note appears again, only a pair is significantly different (strain A with the dose of 2.5 and the control group).
Is it possible that the note indicates that the comparison A vs. Control is not represented by the lines display but need to be suppressed in order to generate a lines display?
Please, find in attached the real data. Maybe this will give some idea...
Thanks a lot!!
Ana
Hi Ana,
I think you have identified why that comparison doesn't show up--it would require some sort of discontinuous line to be compatible with all of the other comparisons.
I would suggest adjust=simulate rather than adjust=tukey. For long reasons why, see Westfall, Tobias and Wolfinger, Multiple Comparisons and Multiple Tests Using SAS, 2nd edition. (2011). For short reasons, it does a better job of protecting the overall type I error rate.
Steve Denham
Thank you very much dear Steve, I will read it!
It has been a great pleasure to work with you.
Regards,
Ana
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.