Hello, I have a situation that I can not understand very well.
I have 2 treatments that are NC and PC. My study design is RCBD. For a specific gut bacteria species data, there is no read in NC group in all 18 samples, meaning that all 18 data are 0. However, in PC group which is 16 samples, 4 samples have read and other are 0. After running proc glimmix for ANOVA, the result turned to be a very small LSMEAN value for NC. My question is, is that value calculated by removing block effect and residue? In this case, I should report "0" (as it is actually no read in this species) or report this small value? Thank you very much for your help!
@kellychan84 wrote:
Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!
I guess it depends where and why you are publishing this.
I would make it clear that the results indicate the "Non-Antibiotics" LS Mean is not statistically different than zero. You probably ought to state what I said above, which indicates that although all 18 subjects which were selected at random had zero bacteria, it is unlikely that in the long run, all subjects who get the "Non-Antibiotic" treatment will have zero bacteria, that is why a very small positive estimate makes more sense than a zero estimate. Lastly, you can talk about how the block variable is used in calculation of the LS Means, this is the math behind the non-zero estimate.
It would be helpful if you showed us the GLIMMIX code you used, and the LSMEANS output.
But yes, I think you are right, that the non-zero value is due to block effects.
Thinking about this some more ...
Since the samples in the study are almost certainly random (not fixed), they represent a random sample of a much larger population and so there's really no reason to think that because these samples had 0 bacteria, that the entire population had zero bacteria. Thus a non-zero estimate makes more sense than a zero estimate here.
Had these samples somehow been fixed (which doesn't make sense in a biological testing mode), then a zero estimate could make sense.
Hello @PaigeMiller Please the code and the output below.
Output :
proc glimmix data= fecal_species_taxonomy nobound;
class treatment block;
model Species300P = treatment;
random block;
lsmeans treatment/tdiff lines;
covtest "block=0" 0 .;
ods output LSMeans = myparmdataset;
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "300-Taxonomy Data fecal Species 300 Percentage ANOVA Results";
run;
proc print data=myparmdataset;
format estimate D12.6
stderr D8.6;
run;
This SAS code does not make me want to change my explanation, which I think fits here.
The other point I would make is that the test to see if t-test to see if "Non-Antibiotics" has a value of zero. This is telling us that the mean for "Non-Antibiotics" is not statistically different than zero, which fits the explanation as well. The data is showing that
Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!
You could report incidence rates, dichotomizing the results to "Absent" and "Present" (coded N and Y in the following):
input trt $ presence $ wt;
datalines;
NC N 18
NC Y 0
PC N 12
PC Y 4
;
proc freq data=one;
tables trt*presence/exact;
weight wt;
run;
This gives a significant (p=0.0392) difference, using Fisher's Exact Test. The incidence in the positive controls is greater than the incidence in the negative controls.
SteveDenham
Hello @SteveDenham ,
my data is looking like this, I attached here. I analyze the percentage. Do you still think I should consider counts?
Diet | Block | Species300 read | Species300 percentage |
NC | 1 | 0 | 0.000000 |
NC | 2 | 0 | 0.000000 |
NC | 2 | 0 | 0.000000 |
NC | 3 | 0 | 0.000000 |
NC | 3 | 0 | 0.000000 |
NC | 4 | 0 | 0.000000 |
NC | 4 | 0 | 0.000000 |
NC | 5 | 0 | 0.000000 |
NC | 5 | 0 | 0.000000 |
NC | 5 | 0 | 0.000000 |
NC | 5 | 0 | 0.000000 |
NC | 6 | 0 | 0.000000 |
NC | 7 | 0 | 0.000000 |
NC | 8 | 0 | 0.000000 |
NC | 9 | 0 | 0.000000 |
NC | 10 | 0 | 0.000000 |
NC | 11 | 0 | 0.000000 |
PC | 1 | 0 | 0.000000 |
PC | 1 | 0 | 0.000000 |
PC | 2 | 0 | 0.000000 |
PC | 2 | 0 | 0.000000 |
PC | 3 | 1 | 0.008666 |
PC | 3 | 0 | 0.000000 |
PC | 4 | 0 | 0.000000 |
PC | 4 | 1 | 0.009787 |
PC | 5 | 4 | 0.026057 |
PC | 5 | 0 | 0.000000 |
PC | 6 | 0 | 0.000000 |
PC | 7 | 1 | 0.005638 |
PC | 8 | 0 | 0.000000 |
PC | 9 | 0 | 0.000000 |
PC | 10 | 0 | 0.000000 |
PC | 11 | 0 | 0.000000 |
@kellychan84 I am getting confused by the branches in this thread. So in reply to your post with the values and percentages (which I assume are proportions), there are 3 possibilities in what I have posted. First is the incidence (present/absent) using PROC FREQ. Second is fitting a zero-inflated hurdle model. That should use the counts, and is something I haven't provided code for. You can look at the PROC GENMOD documentation for that. Third is to recognize that proportions are more likely to be binomially distributed than have normal errors. You could easily adapt your GLIMMIX code for that:
proc glimmix data= fecal_species_taxonomy nobound;
class treatment block;
nloptions tech=nrridg maxiter=1000;
model Species300P = treatment/dist=binomial;
random block;
lsmeans treatment/diff lines ilink;
covtest "block=0" 0 .;
ods output LSMeans = myparmdataset;
output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
title "300-Taxonomy Data fecal Species 300 Percentage ANOVA Results";
run;
Not sure if the ILINK option and LINES option are compatible. Any of these three would be defensible in the literature
SteveDenham
Hello @SteveDenham
You are correct. It is proportion. These data are abnormal distribution. In this case, I will report the LSMEAN by using the code I provided here. But for P value, for abnormal distributed data, I run nonparametric Kruskal–Wallis sum-rank test for their P value.
@kellychan84 wrote:
Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!
I guess it depends where and why you are publishing this.
I would make it clear that the results indicate the "Non-Antibiotics" LS Mean is not statistically different than zero. You probably ought to state what I said above, which indicates that although all 18 subjects which were selected at random had zero bacteria, it is unlikely that in the long run, all subjects who get the "Non-Antibiotic" treatment will have zero bacteria, that is why a very small positive estimate makes more sense than a zero estimate. Lastly, you can talk about how the block variable is used in calculation of the LS Means, this is the math behind the non-zero estimate.
Really appreciate your answer, @PaigeMiller !
I will second @PaigeMiller on this - the GLIMMIX results accurately reflect what is happening The negative variance component also supports this..
If you want to explore this further, one thing that I would consider is that counts, bounded below by zero, very seldom have errors that are normally distributed, and as I said previously, this data is likely zero inflated - google "hurdle model" for ways to think about the processes involved in getting these results.
SteveDenham
Thank you @SteveDenham ! I have never heard and used this hurdle model before. But yes, I will definitely google and take a look at this.
Recall that LSmeans are found by taking the solution to the fixed effects and multiplying by a design dependent vector. There is no guarantee that the solution will result in zeroes even when all observations are zero. So, could you please follow @PaigeMiller 's request and post your GLIMMIX code. There are many things to consider when looking at counts of this sort. In particular, your data may be zero inflated, in which case shifting to PROC GENMOD would be a very good idea.
SteveDenham
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.