For one aspect of my research project, I counted the number of bull thistle plants along five transects within each of 24 plots. I am using the total number of bull thistles within each plot as my dependent variable, abundance. Each plot was seeded to one of four seeding treatments, so there are 6 replicates per treatment. The study is set up as a two-level factorial design, where I'm interested in the effects of diversity (low diversity vs. high diversity) and seeding rate (low seeding rate vs. high seeding rate) so the four treatments are: 1) low diversity, low rate; 2) low diversity, high rate; 3) high diversity, low rate; and 4) high diversity, high rate. Additionally, it is a randomized complete block design so there are six blocks in the field running west to east, with each block containing 4 plots (one of each treatment). The data are nonnormal, and the glimmix code below captures the effects I'm interested in.
Proc glimmix data = BullThistle;
Class Block Diversity Seeding Year;
Model Abundance = Diversity Seeding Diversity*Seeding Year Diversity*Year Seeding*Year Diversity*Seeding*Year/ddfm=kr dist=negbin;
Random Block;
Random Year / subject = Block*Diversity*Seeding type=ar(1) residual;
Lsmeans Diversity*Seeding / diff cl ilink;
Run;
This code works for my sweet clover data. However, as bull thistle is a biennial plant and for two out of the four years there are many individuals and the other two years there are very few, this model does not converge. I am trying to write the appropriate zero-inflated Poisson model but am not sure I am doing it correctly.
My data file is arranged as below, with 0 in diversity and seeding representing low and 1 representing high.
Plot Block Year Diversity Seeding Abundance
1 1 0 1 0 13
2 2 0 0 0 18
3 3 0 1 1 10
4 4 0 0 1 12
5 5 0 1 0 3
6 6 0 0 0 5
7 1 0 0 1 32
8 2 0 1 0 36
9 3 0 0 0 49
10 4 0 1 1 18
11 5 0 0 1 12
etc. through years 0-3.
The ZIP code I am trying to use, based on reading example 15.5 in SAS for Mixed Models, second edition is:
proc nlmixed data=BullThistle;
by diversity seeding;
parameters b0=0 b1=0 b2=0 b3=0 b4=0
a0=0 s2u=1;
linpinfl=a0;
infprob=1/(1+exp(-linpinfl));
lambda=exp(b0+b1*block+b2*year+b3*diversity+b4*seeding+u);
if abundance=0 then
ll = log(infprob+(1-infprob)*exp(-lambda));
else II=log((1-infprob))+abundance*log(lambda)-lgamma(abundance+1)-lambda;
model abundance ~ general(ll);
random u ~ normal(0,s2u) subject=plot;
estimate "inflation probability" infprob;
run;
But I am not sure I am modeling my problem correctly - am I using the correct parameters to correspond with the effects in my data table? For example, is a0 the proper parameter to correspond with abundance? Should seeding and diversity be designated with names such as low and high rather than numbers in my data table? Am I capturing my interest in the diversity and seeding effects correctly, and how do I model their interactions? Have I modeled the covariance structure correctly since it is a repeated measures design?