BookmarkSubscribeRSS Feed
basp
Calcite | Level 5

Hi,

This actually is a follow up of a previous question i asked (proc glimmix, insect counts), but i shall give the information again:

I am busy on a comparative study on different sowing types for field margins in their effect on abundancies of insects

I have 4 different types of sowing type: G(rass),I(ndiginous),T(übinger) and C(lover),

The experimental setup is as such that I have planted on each location these four different sowing types, so at every location there are four field margins, each containing one of the sowing types, but never two same sowing types on the same location.

I then have 3 locations where all four sowing types are represented and 1 location where only I and T are present.

So I have an unbalanced design and a very small sample size.

This actually is data obtained from repeated measurements, I sampled the field margins four times in the season.

I have already (with the help of this forum) done a repeated measurements analysis on the whole abundancies (all insects) and on one functional group of them , using a negative binomial generalized mixed linear model. So here I could fit a model count=treatment date treatment*date with random intercept for location and a repeated measurements cobvariance structure.

but for another  functional group of intrest in the insect-counts I cannot obtain a converging model, even when adjusting the pconv, maxiter...

there are a lot of zero counts for some locations/treatments/dates and i guess that is why it does not converge...

Now comes the problem I would like to solve:

Because I cannot obtain a model with the repeated measurements design, I thought I could look at it for the whole season then…

So now I counted up the number of bumble bees of this functional group at each location at each treatment. Instead of having now for location x, treatment G for example 4 measurements (one on each date), I now have 1 measurement (the 4 measurements of the dates counted up).

My first question is: is this reasonable to do so? Although it is a pilot study, I wouldn’t like to report p-values which are not very accurate because of huge confidence intervals and low power because I by doing this reduced the number of freedom degrees drastically…

Second question: Is the following proposed justified to do in this experimental setup, given that the first question is positively answered?

The counted up data-input  is as follows, below that the code I would want to use, but which I am not sure of…

data jaarkortetong;

input loc $ treatment $ count;

cards;

mb    g     2

md    g     33

pv    g     2

mb    i     1

md    i     44

pv    i     17

gb    i     38

mb    k     2

md    k     5

pv    k     1

mb    t     23

md    t     27

pv    t     27

gb    t     48

run;

ods graphics on;

proc glimmix data=jaarkortetong initglm method=mspl;

nloptions maxiter=2000 tech=congra;;

class treatment loc;

model count=treatment/dist=negbin link=log solution ddfm=sat;

random intercept/ subject=loc;

lsmeans treatment/cl ilink diff plot=meanplot (ilink);

run;

ods graphics off;

best regards,

Bas

6 REPLIES 6
SteveDenham
Jade | Level 19

This should work.  I would consider some very minor changes.  Since there are no R-side effects, better estimates of the confidence interval could probably be obtained by changing the method to QUAD.  This means dropping the ddfm=sat option.

Back to the original repeated measures--it appears that the substantial number of zeroes is exactly the problem, and a zero-inflated negative binomial is probably the appropriate distribution.  And, unfortunately, GLIMMIX does not have this distribution available.  There are a couple of answers.  PROC GENMOD definitely has an approach to zero inflated distributions.  You would have to give up the assumption of loc as a random variable, and be content with narrow inference space results.  The more correct (and of course more difficult) method would be to use PROC NLMIXED to fit a zero inflated negative binomial with random terms.  Search on the SAS-L listserv for this topic.  There should be several papers by Dale McLerran on the subject, with suitable code.

Steve Denham

basp
Calcite | Level 5

thanks again Steve,

i am going to try out the zero-inflated models,

it seems that i will have to choose between modelling also my random effects of the location which is possible in proc nlmixed, but no repeated measurements are possible in this procedure or the proc genmod, in which i could model the repeated measurments, but not the random effects...

if it would not work out, i can allways go back to what i was planning earlier by counting the dates all together, thus avoiding the zero's...

Bas

basp
Calcite | Level 5

The nlmixed is too complex for me to comprehend...

given that, i would still like to use what i previously posted by counting the dates alltogether...

but i see a possible flaw here...it is not an unbalanced design as i stated in my question,

but rather an incomplete design...

should i  drop the 3th location out of my analysis where only two levels of the factor treatment are present,

or can i let it be? I am not sure if leaving it in would not bias my answer...

SteveDenham
Jade | Level 19

Yuck.  I guess you could do that, but I don't think you have to.  It is one of the advantages of considering location as a random effect, rather than as a fixed effect.  You are calculating marginal means, with location adding variation.  If you are really concerned that the variance estimates are going to be affected, you could add a group=treatment option to the random statement, so that variance components for each treatment are estimated separately, and thus are more complete for the treatments measured at all locations.  Try:

proc glimmix data=jaarkortetong initglm method=mspl;

nloptions maxiter=2000 tech=congra;;

class treatment(ref='k') loc;

model count=treatment/dist=negbin link=log solution ddfm=sat;

random intercept/ subject=loc group=treatment;

covtest 'Common variance' homogeneity;

lsmeans treatment/cl ilink diff plot=meanplot (ilink);

run;

For the sample data, this indicates there may be heterogeneous variances (P=0.0484).  I set the reference level to the treatment that had the lowest counts only to make sure the sweep operator picked it up as having an estimate.

I hope this helps.

Steve Denham

basp
Calcite | Level 5

What has to be done with lsmeans which do not differ significantly from zero? It reports p-values for my lsmeans >0,05...Should i still report these lsmeans?

And should i then still look at my posthoc analysis? What should i do with such results?

bas

SteveDenham
Jade | Level 19

Yes.  Report them.  It just means that confidence intervals around the lsmean contain zero, which is not a problem.  And if the treatment effect is significant, the post hoc comparison of lsmeans is certainly in order.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1756 views
  • 3 likes
  • 2 in conversation