Solved: Logistic regression and zero df for replicates - what's wrong?

Addu · Posted 09-03-2020 02:51 PM

Hello!

I'm doing logistic regression analysis for occurrence of a fungal disease of oat in four soil treatments, with four replicates. I use the proc GENMOD. Four soil treatments are "no till", "spring cultivation", "fall cultivation" and "till". Fgram is the independent variable, the occurrence of fungal disaese out of 100 samples. Total sample size per plot is 100 grains of yield.

For some reason the replicate's degrees of freedom equals to zero and I get no results from them.

I have similar other datasets and the analysis goes just fine.

What could be the problem? Can I still use the probabilities from the treatments?

My dataset:

treatment	plot	replicate	fgram	Ntot
till	101	2	14	100
fall cult	102	4	19	100
no till	103	1	5	100
spring cult	104	3	8	100
fall cult	201	4	13	100
till	202	2	6	100
spring cult	203	3	9	100
no till	204	1	9	100
spring cult	301	3	8	100
no till	302	1	6	100
till	303	2	15	100
fall cult	304	4	8	100
spring cult	401	3	2	100
no till	402	1	10	100
till	403	2	8	100
fall cult	404	4	8	100

My code:

proc GENMOD data=WORK.tillageyield;
Title "Logistic regression analysis for f.graminearum occurrrence in yield, treatments are compared to till";
	Class treatment (ref="till") replicate/param=ref;
	model fgram/Ntot = treatment replicate/
	dist=binomial
	link=logit
	waldci;
run;

PaigeMiller · Posted 09-09-2020 08:38 AM

@Addu wrote:

Ok. I don't understand how this experiment is different to my other ones. In my other analyses the replicates get 1 degree of freedom.

Did you actually look at the data in this experiment? Looking at the data is a highly recommended debugging technique. As I said, treatment is completely confounded with replicate. Replicate adds no additional information.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 09-03-2020 03:33 PM

Treatment and replicate are perfectly correlated. Replicate adds no new information, and so you cannot estimate its effect (or said another way, you should get 0 degrees of freedom for replicate).

--
Paige Miller

Addu · Posted 09-07-2020 02:38 PM

Ok. I don't understand how this experiment is different to my other ones. In my other analyses the replicates get 1 degree of freedom.

Here's an example.

Effect of two fungicide treatments and a control treatment to fusarium occurrence in cereal yield samples. Ignore the sample and plot columns.

Sample	fungicide	Cultivar	Replicate	Plot	Fgram	Ntot
yield	control	Viviana	1	111	10	100
yield	control	Viviana	2	112	8	100
yield	control	Viviana	3	113	9	100
yield	control	Viviana	4	114	15	100
yield	control	Marika	1	121	16	100
yield	control	Marika	2	122	7	100
yield	control	Marika	3	123	8	100
yield	control	Marika	4	124	12	100
yield	control	Peppi	1	131	7	100
yield	control	Peppi	2	132	12	100
yield	control	Peppi	3	133	8	100
yield	control	Peppi	4	134	7	100
yield	control	Voitto	1	141	29	100
yield	control	Voitto	2	142	42	100
yield	control	Voitto	3	143	32	100
yield	control	Voitto	4	144	34	100
yield	control	Anniina	1	151	17	100
yield	control	Anniina	2	152	47	100
yield	control	Anniina	3	153	30	100
yield	control	Anniina	4	154	23	100
yield	Delaro	Viviana	1	211	4	100
yield	Delaro	Viviana	2	212	5	100
yield	Delaro	Viviana	3	213	2	100
yield	Delaro	Viviana	4	214	4	100
yield	Delaro	Marika	1	221	9	100
yield	Delaro	Marika	2	222	14	100
yield	Delaro	Marika	3	223	10	100
yield	Delaro	Marika	4	224	10	100
yield	Delaro	Peppi	1	231	10	100
yield	Delaro	Peppi	2	232	5	100
yield	Delaro	Peppi	3	233	2	100
yield	Delaro	Peppi	4	234	9	100
yield	Delaro	Voitto	1	241	14	100
yield	Delaro	Voitto	2	242	14	100
yield	Delaro	Voitto	3	243	14	100
yield	Delaro	Voitto	4	244	15	100
yield	Delaro	Anniina	1	251	16	100
yield	Delaro	Anniina	2	252	8	100
yield	Delaro	Anniina	3	253	9	100
yield	Delaro	Anniina	4	254	2	100
yield	Proline	Viviana	1	311	3	100
yield	Proline	Viviana	2	312	1	100
yield	Proline	Viviana	3	313	7	100
yield	Proline	Viviana	4	314	7	100
yield	Proline	Marika	1	321	8	100
yield	Proline	Marika	2	322	9	100
yield	Proline	Marika	3	323	8	100
yield	Proline	Marika	4	324	12	100
yield	Proline	Peppi	1	331	4	100
yield	Proline	Peppi	2	332	12	100
yield	Proline	Peppi	3	333	11	100
yield	Proline	Peppi	4	334	11	100
yield	Proline	Voitto	1	341	15	100
yield	Proline	Voitto	2	342	15	100
yield	Proline	Voitto	3	343	18	100
yield	Proline	Voitto	4	344	11	100
yield	Proline	Anniina	1	351	12	100
yield	Proline	Anniina	2	352	6	100
yield	Proline	Anniina	3	353	11	100
yield	Proline	Anniina	4	354	2	100

Proc GENMOD data=WORK.fungicide; 

Title "Logistic regression analysis for Fusarium occurrence in fungicide treated cereal yield samples, treatments compared to control, cultivars compared to Anniina";
	Class Fungicide (ref="control") Cultivar (ref="Anniina") Replicate/param=ref;
	model fgram/Ntot = Fungicide Cultivar Replicate Fungicide*Cultivar/
	dist=binomial
	link=logit
	waldci;
run;

Results table attached. Notice the 1 df in the replicates.

I did run this by my teacher and he checked it ok. He was a SAS genius. He sadly is no longer with us, that's why I'm asking here.

-Addu

SteveDenham · Posted 09-09-2020 08:24 AM

Sort your data by replicate. You will see that for replicate=1, you only have treatment='no till'. Compare that to the fungicide dataset. When sorted by replicate, each combination of fungicide and cultivar appears. This would be expected for a randomized block design.

I want to point out two things to consider. The test in the solution table is for the individual coefficient=0, not for the effect of cultivar, fungicide or replicate. Change up the genmod code to add type3 as an option in the MODEL statement to get these more global tests.

Second, if this is indeed a randomized block design, and you want to infer to a broader space than just the replicates in the study, you should change over to a mixed model approach with replicate as a random effect. For the fungicide data (which is currently the only one that can be analyzed properly) the code would look like:

Title "Logistic regression analysis for Fusarium occurrence in fungicide treated cereal yield samples, treatments compared to control, cultivars compared to Anniina";
proc glimmix data=the_data_above;
	Class Fungicide Cultivar Replicate;
	model fgram/Ntot = Fungicide Cultivar Fungicide*Cultivar/
	solution
	waldci;
        random intercept/subject=replicate;
        lsmeans fungicide cultivar fungicide*cultivar/ci;
run;

Now the comparisons of interest should depend on the results of the F tests. If the interaction is significant, you would need to look at the simple effect using either the SLICE statement, or the slicediff option in the lsmeans statement. If it is not significant, using the diff=control option for the main effect lsmeans will yield the tests in the title. These would look like:

lsmeans fungicide*cultivar/slicediff=(fungicide cultivar) slicedifftype=control ('control' 'Anniina') ilink;

lsmeans fungicide cultivar/diff=control ('control' 'Anniina') ilink;

Edited to add ilink to the lsmeans to get the results back on the original scale. Also, recall the differences will not transform back to differences in probabilities using this method. For that you will need to invoke the NLMeans macro.

SteveDenham

Addu · Posted 09-09-2020 01:39 PM

Thank you very much for your input on the fungicide data! I'll try out your suggested modifications.

-Addu

PaigeMiller · Posted 09-09-2020 08:38 AM

@Addu wrote:

Ok. I don't understand how this experiment is different to my other ones. In my other analyses the replicates get 1 degree of freedom.

Did you actually look at the data in this experiment? Looking at the data is a highly recommended debugging technique. As I said, treatment is completely confounded with replicate. Replicate adds no additional information.

--
Paige Miller

Addu · Posted 09-09-2020 01:08 PM

Thank you for your help!

I had mistook block markings as replicates numbers. Embarrassing, but someone else had to point it out. Datasheet blindness.

As you said a good debug is looking at the data - which I did, many times! Another good one is to ask someone else to also take a look.

-Addu

PaigeMiller · Posted 09-09-2020 01:13 PM

@Addu wrote:

As you said a good debug is looking at the data - which I did, many times! Another good one is to ask someone else to also take a look.

True, true!

--
Paige Miller

Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?

Re: Logistic regression and zero df for replicates - what's wrong?