Solved: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, t...

kellychan84 · Posted 04-27-2022 11:23 AM

Hello, I have a situation that I can not understand very well.

I have 2 treatments that are NC and PC. My study design is RCBD. For a specific gut bacteria species data, there is no read in NC group in all 18 samples, meaning that all 18 data are 0. However, in PC group which is 16 samples, 4 samples have read and other are 0. After running proc glimmix for ANOVA, the result turned to be a very small LSMEAN value for NC. My question is, is that value calculated by removing block effect and residue? In this case, I should report "0" (as it is actually no read in this species) or report this small value? Thank you very much for your help!

PaigeMiller · Posted 04-27-2022 01:32 PM

@kellychan84 wrote:

Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!

I guess it depends where and why you are publishing this.

I would make it clear that the results indicate the "Non-Antibiotics" LS Mean is not statistically different than zero. You probably ought to state what I said above, which indicates that although all 18 subjects which were selected at random had zero bacteria, it is unlikely that in the long run, all subjects who get the "Non-Antibiotic" treatment will have zero bacteria, that is why a very small positive estimate makes more sense than a zero estimate. Lastly, you can talk about how the block variable is used in calculation of the LS Means, this is the math behind the non-zero estimate.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 04-27-2022 11:46 AM

It would be helpful if you showed us the GLIMMIX code you used, and the LSMEANS output.

But yes, I think you are right, that the non-zero value is due to block effects.

--
Paige Miller

PaigeMiller · Posted 04-27-2022 12:25 PM

Thinking about this some more ...

Since the samples in the study are almost certainly random (not fixed), they represent a random sample of a much larger population and so there's really no reason to think that because these samples had 0 bacteria, that the entire population had zero bacteria. Thus a non-zero estimate makes more sense than a zero estimate here.

Had these samples somehow been fixed (which doesn't make sense in a biological testing mode), then a zero estimate could make sense.

--
Paige Miller

kellychan84 · Posted 04-27-2022 12:46 PM

Hello @PaigeMiller Please the code and the output below.

Output :

proc glimmix data= fecal_species_taxonomy nobound;  
  class treatment block;
  model Species300P = treatment; 
  random block; 
  lsmeans treatment/tdiff lines;
  covtest "block=0" 0 .;
  ods output LSMeans = myparmdataset;
  output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
  title "300-Taxonomy Data fecal Species 300 Percentage ANOVA Results";
run;
proc print data=myparmdataset;
format estimate D12.6
       stderr D8.6;
run;

PaigeMiller · Posted 04-27-2022 01:03 PM

This SAS code does not make me want to change my explanation, which I think fits here.

The other point I would make is that the test to see if t-test to see if "Non-Antibiotics" has a value of zero. This is telling us that the mean for "Non-Antibiotics" is not statistically different than zero, which fits the explanation as well. The data is showing that

--
Paige Miller

kellychan84 · Posted 04-27-2022 01:08 PM

Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!

SteveDenham · Posted 04-27-2022 01:17 PM

You could report incidence rates, dichotomizing the results to "Absent" and "Present" (coded N and Y in the following):

input trt $ presence $ wt;
datalines;
NC N 18
NC Y 0
PC N 12
PC Y 4
;

proc freq data=one;
tables trt*presence/exact;
weight wt;
run;

This gives a significant (p=0.0392) difference, using Fisher's Exact Test. The incidence in the positive controls is greater than the incidence in the negative controls.

SteveDenham

kellychan84 · Posted 04-27-2022 01:23 PM

Hello @SteveDenham ,

my data is looking like this, I attached here. I analyze the percentage. Do you still think I should consider counts?

Diet	Block	Species300 read	Species300 percentage
NC	1	0	0.000000
NC	2	0	0.000000
NC	2	0	0.000000
NC	3	0	0.000000
NC	3	0	0.000000
NC	4	0	0.000000
NC	4	0	0.000000
NC	5	0	0.000000
NC	5	0	0.000000
NC	5	0	0.000000
NC	5	0	0.000000
NC	6	0	0.000000
NC	7	0	0.000000
NC	8	0	0.000000
NC	9	0	0.000000
NC	10	0	0.000000
NC	11	0	0.000000
PC	1	0	0.000000
PC	1	0	0.000000
PC	2	0	0.000000
PC	2	0	0.000000
PC	3	1	0.008666
PC	3	0	0.000000
PC	4	0	0.000000
PC	4	1	0.009787
PC	5	4	0.026057
PC	5	0	0.000000
PC	6	0	0.000000
PC	7	1	0.005638
PC	8	0	0.000000
PC	9	0	0.000000
PC	10	0	0.000000
PC	11	0	0.000000

SteveDenham · Posted 04-27-2022 01:38 PM

@kellychan84 I am getting confused by the branches in this thread. So in reply to your post with the values and percentages (which I assume are proportions), there are 3 possibilities in what I have posted. First is the incidence (present/absent) using PROC FREQ. Second is fitting a zero-inflated hurdle model. That should use the counts, and is something I haven't provided code for. You can look at the PROC GENMOD documentation for that. Third is to recognize that proportions are more likely to be binomially distributed than have normal errors. You could easily adapt your GLIMMIX code for that:

proc glimmix data= fecal_species_taxonomy nobound;  
  class treatment block;
nloptions tech=nrridg maxiter=1000;
  model Species300P = treatment/dist=binomial; 
  random block; 
  lsmeans treatment/diff lines ilink;
  covtest "block=0" 0 .;
  ods output LSMeans = myparmdataset;
  output out=second predicted=pred residual=resid residual(noblup)=mresid student=sresid student(noblup)=smresid;
  title "300-Taxonomy Data fecal Species 300 Percentage ANOVA Results";
run;

Not sure if the ILINK option and LINES option are compatible. Any of these three would be defensible in the literature

SteveDenham

kellychan84 · Posted 04-27-2022 01:45 PM

Hello @SteveDenham

You are correct. It is proportion. These data are abnormal distribution. In this case, I will report the LSMEAN by using the code I provided here. But for P value, for abnormal distributed data, I run nonparametric Kruskal–Wallis sum-rank test for their P value.

PaigeMiller · Posted 04-27-2022 01:32 PM

@kellychan84 wrote:

Thank you @PaigeMiller !! This comes to my next question, if the mean is not statistically significant from 0, in this case, what should we report? Report what SAS gives you and define/mention this in the methodology part? What is your suggestion in terms of publishing?Thanks!

I guess it depends where and why you are publishing this.

I would make it clear that the results indicate the "Non-Antibiotics" LS Mean is not statistically different than zero. You probably ought to state what I said above, which indicates that although all 18 subjects which were selected at random had zero bacteria, it is unlikely that in the long run, all subjects who get the "Non-Antibiotic" treatment will have zero bacteria, that is why a very small positive estimate makes more sense than a zero estimate. Lastly, you can talk about how the block variable is used in calculation of the LS Means, this is the math behind the non-zero estimate.

--
Paige Miller

kellychan84 · Posted 04-27-2022 01:36 PM

Really appreciate your answer, @PaigeMiller !

SteveDenham · Posted 04-27-2022 01:09 PM

I will second @PaigeMiller on this - the GLIMMIX results accurately reflect what is happening The negative variance component also supports this..

If you want to explore this further, one thing that I would consider is that counts, bounded below by zero, very seldom have errors that are normally distributed, and as I said previously, this data is likely zero inflated - google "hurdle model" for ways to think about the processes involved in getting these results.

SteveDenham

kellychan84 · Posted 04-27-2022 01:18 PM

Thank you @SteveDenham ! I have never heard and used this hurdle model before. But yes, I will definitely google and take a look at this.

SteveDenham · Posted 04-27-2022 12:15 PM

Recall that LSmeans are found by taking the solution to the fixed effects and multiplying by a design dependent vector. There is no guarantee that the solution will result in zeroes even when all observations are zero. So, could you please follow @PaigeMiller 's request and post your GLIMMIX code. There are many things to consider when looking at counts of this sort. In particular, your data may be zero inflated, in which case shifting to PROC GENMOD would be a very good idea.

SteveDenham

Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is not 0

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

Re: Raw data in NC group are 0 for 18 samples, why after ANOVA analysis, the least square mean is no

The 2025 SAS Hackathon has begun!