Hello!
I'm doing logistic regression analysis for occurrence of a fungal disease of oat in four soil treatments, with four replicates. I use the proc GENMOD. Four soil treatments are "no till", "spring cultivation", "fall cultivation" and "till". Fgram is the independent variable, the occurrence of fungal disaese out of 100 samples. Total sample size per plot is 100 grains of yield.
For some reason the replicate's degrees of freedom equals to zero and I get no results from them.
I have similar other datasets and the analysis goes just fine.
What could be the problem? Can I still use the probabilities from the treatments?
My dataset:
treatment | plot | replicate | fgram | Ntot |
till | 101 | 2 | 14 | 100 |
fall cult | 102 | 4 | 19 | 100 |
no till | 103 | 1 | 5 | 100 |
spring cult | 104 | 3 | 8 | 100 |
fall cult | 201 | 4 | 13 | 100 |
till | 202 | 2 | 6 | 100 |
spring cult | 203 | 3 | 9 | 100 |
no till | 204 | 1 | 9 | 100 |
spring cult | 301 | 3 | 8 | 100 |
no till | 302 | 1 | 6 | 100 |
till | 303 | 2 | 15 | 100 |
fall cult | 304 | 4 | 8 | 100 |
spring cult | 401 | 3 | 2 | 100 |
no till | 402 | 1 | 10 | 100 |
till | 403 | 2 | 8 | 100 |
fall cult | 404 | 4 | 8 | 100 |
My code:
proc GENMOD data=WORK.tillageyield;
Title "Logistic regression analysis for f.graminearum occurrrence in yield, treatments are compared to till";
Class treatment (ref="till") replicate/param=ref;
model fgram/Ntot = treatment replicate/
dist=binomial
link=logit
waldci;
run;
@Addu wrote:
Ok. I don't understand how this experiment is different to my other ones. In my other analyses the replicates get 1 degree of freedom.
Did you actually look at the data in this experiment? Looking at the data is a highly recommended debugging technique. As I said, treatment is completely confounded with replicate. Replicate adds no additional information.
Treatment and replicate are perfectly correlated. Replicate adds no new information, and so you cannot estimate its effect (or said another way, you should get 0 degrees of freedom for replicate).
Ok. I don't understand how this experiment is different to my other ones. In my other analyses the replicates get 1 degree of freedom.
Here's an example.
Effect of two fungicide treatments and a control treatment to fusarium occurrence in cereal yield samples. Ignore the sample and plot columns.
Sample | fungicide | Cultivar | Replicate | Plot | Fgram | Ntot |
yield | control | Viviana | 1 | 111 | 10 | 100 |
yield | control | Viviana | 2 | 112 | 8 | 100 |
yield | control | Viviana | 3 | 113 | 9 | 100 |
yield | control | Viviana | 4 | 114 | 15 | 100 |
yield | control | Marika | 1 | 121 | 16 | 100 |
yield | control | Marika | 2 | 122 | 7 | 100 |
yield | control | Marika | 3 | 123 | 8 | 100 |
yield | control | Marika | 4 | 124 | 12 | 100 |
yield | control | Peppi | 1 | 131 | 7 | 100 |
yield | control | Peppi | 2 | 132 | 12 | 100 |
yield | control | Peppi | 3 | 133 | 8 | 100 |
yield | control | Peppi | 4 | 134 | 7 | 100 |
yield | control | Voitto | 1 | 141 | 29 | 100 |
yield | control | Voitto | 2 | 142 | 42 | 100 |
yield | control | Voitto | 3 | 143 | 32 | 100 |
yield | control | Voitto | 4 | 144 | 34 | 100 |
yield | control | Anniina | 1 | 151 | 17 | 100 |
yield | control | Anniina | 2 | 152 | 47 | 100 |
yield | control | Anniina | 3 | 153 | 30 | 100 |
yield | control | Anniina | 4 | 154 | 23 | 100 |
yield | Delaro | Viviana | 1 | 211 | 4 | 100 |
yield | Delaro | Viviana | 2 | 212 | 5 | 100 |
yield | Delaro | Viviana | 3 | 213 | 2 | 100 |
yield | Delaro | Viviana | 4 | 214 | 4 | 100 |
yield | Delaro | Marika | 1 | 221 | 9 | 100 |
yield | Delaro | Marika | 2 | 222 | 14 | 100 |
yield | Delaro | Marika | 3 | 223 | 10 | 100 |
yield | Delaro | Marika | 4 | 224 | 10 | 100 |
yield | Delaro | Peppi | 1 | 231 | 10 | 100 |
yield | Delaro | Peppi | 2 | 232 | 5 | 100 |
yield | Delaro | Peppi | 3 | 233 | 2 | 100 |
yield | Delaro | Peppi | 4 | 234 | 9 | 100 |
yield | Delaro | Voitto | 1 | 241 | 14 | 100 |
yield | Delaro | Voitto | 2 | 242 | 14 | 100 |
yield | Delaro | Voitto | 3 | 243 | 14 | 100 |
yield | Delaro | Voitto | 4 | 244 | 15 | 100 |
yield | Delaro | Anniina | 1 | 251 | 16 | 100 |
yield | Delaro | Anniina | 2 | 252 | 8 | 100 |
yield | Delaro | Anniina | 3 | 253 | 9 | 100 |
yield | Delaro | Anniina | 4 | 254 | 2 | 100 |
yield | Proline | Viviana | 1 | 311 | 3 | 100 |
yield | Proline | Viviana | 2 | 312 | 1 | 100 |
yield | Proline | Viviana | 3 | 313 | 7 | 100 |
yield | Proline | Viviana | 4 | 314 | 7 | 100 |
yield | Proline | Marika | 1 | 321 | 8 | 100 |
yield | Proline | Marika | 2 | 322 | 9 | 100 |
yield | Proline | Marika | 3 | 323 | 8 | 100 |
yield | Proline | Marika | 4 | 324 | 12 | 100 |
yield | Proline | Peppi | 1 | 331 | 4 | 100 |
yield | Proline | Peppi | 2 | 332 | 12 | 100 |
yield | Proline | Peppi | 3 | 333 | 11 | 100 |
yield | Proline | Peppi | 4 | 334 | 11 | 100 |
yield | Proline | Voitto | 1 | 341 | 15 | 100 |
yield | Proline | Voitto | 2 | 342 | 15 | 100 |
yield | Proline | Voitto | 3 | 343 | 18 | 100 |
yield | Proline | Voitto | 4 | 344 | 11 | 100 |
yield | Proline | Anniina | 1 | 351 | 12 | 100 |
yield | Proline | Anniina | 2 | 352 | 6 | 100 |
yield | Proline | Anniina | 3 | 353 | 11 | 100 |
yield | Proline | Anniina | 4 | 354 | 2 | 100 |
Proc GENMOD data=WORK.fungicide;
Title "Logistic regression analysis for Fusarium occurrence in fungicide treated cereal yield samples, treatments compared to control, cultivars compared to Anniina";
Class Fungicide (ref="control") Cultivar (ref="Anniina") Replicate/param=ref;
model fgram/Ntot = Fungicide Cultivar Replicate Fungicide*Cultivar/
dist=binomial
link=logit
waldci;
run;
Results table attached. Notice the 1 df in the replicates.
I did run this by my teacher and he checked it ok. He was a SAS genius. He sadly is no longer with us, that's why I'm asking here.
-Addu
Sort your data by replicate. You will see that for replicate=1, you only have treatment='no till'. Compare that to the fungicide dataset. When sorted by replicate, each combination of fungicide and cultivar appears. This would be expected for a randomized block design.
I want to point out two things to consider. The test in the solution table is for the individual coefficient=0, not for the effect of cultivar, fungicide or replicate. Change up the genmod code to add type3 as an option in the MODEL statement to get these more global tests.
Second, if this is indeed a randomized block design, and you want to infer to a broader space than just the replicates in the study, you should change over to a mixed model approach with replicate as a random effect. For the fungicide data (which is currently the only one that can be analyzed properly) the code would look like:
Title "Logistic regression analysis for Fusarium occurrence in fungicide treated cereal yield samples, treatments compared to control, cultivars compared to Anniina";
proc glimmix data=the_data_above;
Class Fungicide Cultivar Replicate;
model fgram/Ntot = Fungicide Cultivar Fungicide*Cultivar/
solution
waldci;
random intercept/subject=replicate;
lsmeans fungicide cultivar fungicide*cultivar/ci;
run;
Now the comparisons of interest should depend on the results of the F tests. If the interaction is significant, you would need to look at the simple effect using either the SLICE statement, or the slicediff option in the lsmeans statement. If it is not significant, using the diff=control option for the main effect lsmeans will yield the tests in the title. These would look like:
lsmeans fungicide*cultivar/slicediff=(fungicide cultivar) slicedifftype=control ('control' 'Anniina') ilink;
lsmeans fungicide cultivar/diff=control ('control' 'Anniina') ilink;
Edited to add ilink to the lsmeans to get the results back on the original scale. Also, recall the differences will not transform back to differences in probabilities using this method. For that you will need to invoke the NLMeans macro.
SteveDenham
Thank you very much for your input on the fungicide data! I'll try out your suggested modifications.
-Addu
@Addu wrote:
Ok. I don't understand how this experiment is different to my other ones. In my other analyses the replicates get 1 degree of freedom.
Did you actually look at the data in this experiment? Looking at the data is a highly recommended debugging technique. As I said, treatment is completely confounded with replicate. Replicate adds no additional information.
Thank you for your help!
I had mistook block markings as replicates numbers. Embarrassing, but someone else had to point it out. Datasheet blindness.
As you said a good debug is looking at the data - which I did, many times! Another good one is to ask someone else to also take a look.
-Addu
@Addu wrote:
As you said a good debug is looking at the data - which I did, many times! Another good one is to ask someone else to also take a look.
True, true!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.