BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kellychan84
Fluorite | Level 6

Hello, I have a situation that I can not understand very well. 

I have 2 treatments that are NC and PC. My study design is RCBD. For a specific gut bacteria species data, there is no read in NC group in all 18 samples, meaning that all 18 data are 0. However, in PC group which is 16 samples, 4 samples have read and other are 0. After running proc glimmix for ANOVA, the result turned to be a very small LSMEAN value for NC. My question is, is that value calculated by removing block effect and residue? In this case, I should report "0" (as it is actually no read in this species) or report this small value? Thank you very much for your help!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@kellychan84 wrote:

Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!


I guess it depends where and why you are publishing this. 

 

I would make it clear that the results indicate the "Non-Antibiotics" LS Mean is not statistically different than zero. You probably ought to state what I said above, which indicates that although all 18 subjects which were selected at random had zero bacteria, it is unlikely that in the long run, all subjects who get the "Non-Antibiotic" treatment will have zero bacteria, that is why a very small positive estimate makes more sense than a zero estimate. Lastly, you can talk about how the block variable is used in calculation of the LS Means, this is the math behind the non-zero estimate.

--
Paige Miller

View solution in original post

14 REPLIES 14
PaigeMiller
Diamond | Level 26

It would be helpful if you showed us the GLIMMIX code you used, and the LSMEANS output.

 

But yes, I think you are right, that the non-zero value is due to block effects.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Thinking about this some more ...

 

Since the samples in the study are almost certainly random (not fixed), they represent a random sample of a much larger population and so there's really no reason to think that because these samples had 0 bacteria, that the entire population had zero bacteria. Thus a non-zero estimate makes more sense than a zero estimate here.

 

Had these samples somehow been fixed (which doesn't make sense in a biological testing mode), then a zero estimate could make sense.

--
Paige Miller
kellychan84
Fluorite | Level 6

Hello @PaigeMiller Please the code and the output below.

Output :

kellychan84_1-1651077930303.png

kellychan84_0-1651077908187.png

 

proc glimmix data= fecal_species_taxonomy nobound;  
  class treatment block;
  model Species300P = treatment; 
  random block; 
  lsmeans treatment/tdiff lines;
  covtest "block=0" 0 .;
  ods output LSMeans = myparmdataset;
  output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
  title "300-Taxonomy Data fecal Species 300 Percentage ANOVA Results";
run;
proc print data=myparmdataset;
format estimate D12.6
       stderr D8.6;
run;

 

PaigeMiller
Diamond | Level 26

This SAS code does not make me want to change my explanation, which I think fits here.

 

The other point I would make is that the test to see if t-test to see if "Non-Antibiotics" has a value of zero. This is telling us that the mean for "Non-Antibiotics" is not statistically different than zero, which fits the explanation as well. The data is showing that 

--
Paige Miller
kellychan84
Fluorite | Level 6

Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!

SteveDenham
Jade | Level 19

You could report incidence rates, dichotomizing the results to "Absent" and "Present" (coded N and Y in the following):

 

input trt $ presence $ wt;
datalines;
NC N 18
NC Y 0
PC N 12
PC Y 4
;

proc freq data=one;
tables trt*presence/exact;
weight wt;
run;

This gives a significant (p=0.0392) difference, using Fisher's Exact Test.  The incidence in the positive controls is greater than the incidence in the negative controls.

 

SteveDenham

 

kellychan84
Fluorite | Level 6

Hello @SteveDenham ,

my data is looking like this, I attached here. I analyze the percentage. Do you still think I should consider counts?

DietBlockSpecies300 readSpecies300 percentage
NC100.000000
NC200.000000
NC200.000000
NC300.000000
NC300.000000
NC400.000000
NC400.000000
NC500.000000
NC500.000000
NC500.000000
NC500.000000
NC600.000000
NC700.000000
NC800.000000
NC900.000000
NC1000.000000
NC1100.000000
PC100.000000
PC100.000000
PC200.000000
PC200.000000
PC310.008666
PC300.000000
PC400.000000
PC410.009787
PC540.026057
PC500.000000
PC600.000000
PC710.005638
PC800.000000
PC900.000000
PC1000.000000
PC1100.000000
SteveDenham
Jade | Level 19

@kellychan84  I am getting confused by the branches in this thread.  So in reply to your post with the values and percentages (which I assume are proportions), there are 3 possibilities in what I have posted.  First is the incidence (present/absent) using PROC FREQ.  Second is fitting a zero-inflated hurdle model.  That should use the counts, and is something I haven't provided code for.  You can look at the PROC GENMOD documentation for that.  Third is to recognize that proportions are more likely to be binomially distributed than have normal errors.  You could easily adapt your GLIMMIX code for that:

 

proc glimmix data= fecal_species_taxonomy nobound;  
  class treatment block;
nloptions tech=nrridg maxiter=1000; model Species300P = treatment/dist=binomial; random block; lsmeans treatment/diff lines ilink; covtest "block=0" 0 .; ods output LSMeans = myparmdataset; output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid; title "300-Taxonomy Data fecal Species 300 Percentage ANOVA Results"; run;

Not sure if the ILINK option and LINES option are compatible.  Any of these three would be defensible in the literature

 

SteveDenham

kellychan84
Fluorite | Level 6

Hello @SteveDenham 

 

You are correct. It is proportion. These data are abnormal distribution. In this case, I will report the LSMEAN by using the code I provided here. But for P value, for abnormal distributed data, I run nonparametric Kruskal–Wallis sum-rank test for their P value.  

PaigeMiller
Diamond | Level 26

@kellychan84 wrote:

Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!


I guess it depends where and why you are publishing this. 

 

I would make it clear that the results indicate the "Non-Antibiotics" LS Mean is not statistically different than zero. You probably ought to state what I said above, which indicates that although all 18 subjects which were selected at random had zero bacteria, it is unlikely that in the long run, all subjects who get the "Non-Antibiotic" treatment will have zero bacteria, that is why a very small positive estimate makes more sense than a zero estimate. Lastly, you can talk about how the block variable is used in calculation of the LS Means, this is the math behind the non-zero estimate.

--
Paige Miller
SteveDenham
Jade | Level 19

I will second @PaigeMiller  on this - the GLIMMIX results accurately reflect what is happening  The negative variance component also supports this.. 

 

If you want to explore this further, one thing that I would consider is that counts, bounded below by zero, very seldom have errors that are normally distributed, and as I said previously, this data is likely zero inflated - google "hurdle model" for ways to think about the processes involved in getting these results.

 

SteveDenham

kellychan84
Fluorite | Level 6

Thank you @SteveDenham ! I have never heard and used this hurdle model before. But yes, I will definitely google and take a look at this. 

SteveDenham
Jade | Level 19

Recall that LSmeans are found by taking the solution to the fixed effects and multiplying by a design dependent vector.  There is no guarantee that the solution will result in zeroes even when all observations are zero.  So, could you please follow @PaigeMiller 's request and post your GLIMMIX code.  There are many things to consider when looking at counts of this sort.  In particular, your data may be zero inflated, in which case shifting to PROC GENMOD would be a very good idea.

 

SteveDenham

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 3811 views
  • 2 likes
  • 3 in conversation