BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hello All,

I am trying to construct and compare  conditional and marginal versions of the same basic model for an analysis of plant reproductive data with repeated measurements within plants from a field experiment. I believe I have constructed a legitimate GEE-type Marginal model with fixed and random effects, but am struggling to construct a corresponding conditional model (GLMM). Briefly the experimental design:

- 10 experimental plots, assigned randomly to 2 treatments (treatments are applied to whole plots, so this is not a typical form of blocking).

     - 10 randomly sub-sampled individual plants within each plot.

          - Each plant has multiple flowers on a single flowering stalk (ranging between 3-11), all flowers on each individual were sampled, and whether or not they developed a fruit represents a binary response variable (1 = sucesssful fruit production).

               - Based on previous work, I have reason to expect flowers on a single individual to be spatially correlated by their position on the flowering stalk, and have chosen to model this covariance structure by using sp(pow) (flower_position).

My marginal model gives believable results using the following code:

PROC GLIMMIX data=fruit  ;

    nloptions technique=trureg ;

    class  treatment plot ind ;

    model fruit_bin = treatment|rflowpos|day_ff

        / dist=binomial link=logit ddfm=kenwardroger S;

    random intercept / subject=plot(treatment);

    random intercept / subject=ind(treatment plot) ;   

    random _residual_ / subject = ind(treatment plot) type=sp(pow)(flowpos) ;

lsmeans treatment poll_supp treatment*poll_supp / ilink pdiff;

/* CONTRASTS */

contrast 'Treatment effect' intercept 1 treatment 1 -1 ;

/* ESTIMATES */

estimate ' R mean' intercept 1 treatment 1 0;

estimate ' U mean' intercept 1 treatment 0 1;

estimate 'Treatment effect' intercept 1 treatment 1 -1 ;

RUN;

I think that the conditional model that best reflects the design of the experiment is as follows:

PROC GLIMMIX data=fruit method=laplace empirical=MNB ;

    nloptions technique=trureg ;

    class  treatment plot ind ;

    model fruit_bin = treatment|rflowpos|day_ff

        / dist=binomial link=logit S;

    random intercept / subject = plot(treatment) ;

    random intercept / subject = ind(treatment plot) ;   

    random flowpos / subject = ind(treatment plot) type=sp(pow)(flowpos) ;

RUN;

However, when I run this model, I have a few issues: I can get it to converge if I use technique=nrridg, but then I have 0'd out variances for ind(treatment plot) and SP(POW) and the corresponding warning telling me the G matrix is not positive definite. I also get outrageous fixed effect results (e.g. F= infinity for most effects).  I have tried various permutations of the random effects. If I drop either the first or second random statement the model will converge with results that are not dramatically different from the marginal model, but these models give 0'd variance estimates for the covariance parameters, and do not accurately reflect the experimental design (I believe). Can you see any glaring errors in the way I have constructed these two models? If not, perhaps it is simply a problem with insufficient replication for my subjects (particularly within individuals, as different individuals had different numbers of flowers)? Any and all advice is much appreciated.

cheers.

Colin

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Ah, the dangers of the ESTIMATE statement.  Those two estimate statements have 'hidden' parts, related to day_ff.  Since day_ff is a continuous covariate, the estimate statement is looking for a value for day_ff.  Since it doesn't appear in the statement, it plugs in day_ff=0 into the solution vector to get the estimate.  CONTRAST sort of avoids this, as it is really looking at the difference, and the covariate part is probably consistent across levels, so if it isn't included, the difference still looks right.  Personally, I avoid CONTRAST and ESTIMATE in PROC GLIMMIX, and depend on LSMESTIMATE for specific comparisons.  Oh--the LSMEANS statement plugs in the mean value of day_ff when constructing the value, and that is why it is different from your ESTIMATE statement values.

About the consistency of the Fixed Effects and the LSMEANS--you can construct the lsmeans from the L matrix and the solution vector, so they are consistent.  This is where I quote the Wizard of OZ, and say "Pay no attention to that man behind the curtain," and apply that to the tests that show up regarding the components of the solution vector.  Those all depend on the parameterization that you specify, and if it is the default GLM parameterization, a lot of the values are set to zero, so the whole thing is interesting, but really only important for continuous effects.  The class effects do get correctly "collapsed" into type 3 tests in a separate table.

So, LSMEANS automatically accounts for continuous covariates at the mean value.  The standard errors of the estimates and the differences automatically account for the random effects and the covariance structure, and if you have wonderfully balanced data, don't affect the estimates of the means.  But--imbalance changes everything.  The LSMEAN is the 'expected' value, given the data, averaged over stuff such that it represents the population value you would observe if everything were balanced.  And I get more confused when comparing marginal and conditional LSMEANS.  I then turn to Stroup's Generalized Linear Mixed Models for examples that make sense.  Take a look at some of the examples in there, if you can get a copy.

Steve Denham

View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

Hi Colin,

This looks logical, so the conditional model may be having problems due to the insufficient replication you mention.

I do have one question.  What is the variable day_ff?  It is entering both the marginal and conditional models as a continuous variable, and so is getting a strictly linear effect.  If it is some type of observation order variable, you will probably need to model it as a residual effect in the marginal model, and as a conditional effect in the other model.  Other than that, I think you have hit the nail on the head with the insufficient replication idea.  There just isn't enough variation in a binomial endpoint to fit all of the factors.  And that is too bad, because I think the conditional model is a better description of the experimental design.

Steve Denham

ColinGradstudent
Calcite | Level 5

Hi Steve,

     Thanks for having a look, and I agree that the conditional model makes more sense. As you suspect I have less replication particularly for small & large flower positions... I will see if excluding those poorly replicated small and large individuals helps with estimating the within-subject covariance parameters sp(pow) & variance... but it probably won't since I'll be sacrificing among individual replication... which wasn't getting estimated either.  grr.

     The variable day_ff refers to the date that each individual first started flowering. While flower position is crucial for accounting for wihin-individual correlated responses, the timing of reproduction is the covariate of primary interest for this experiment.

     I have a follow up question regarding estimation, and what exactly LSMEANS does relative to Fixed Effects and ESTIMATE/CONTRAST statements to account for the modeled covariance structure. I noticed something odd in my marginal model results for all of the effects including only classification variables: The LSMEANS results did not agree with the Fixed effects. As an example, here were the various results for Fixed, ESTIMATE, CONTRAST, & LSMEANS for the Treatment effect:

- In the fixed effects results, the Treatment effect returned F=4.54, Pr>F = 0.0342.

- Including contrast 'Treat. Effect' intercept 1 treatment 1 -1 ; have results that were identical to the Fixed Effects results.

- Estimating each Treatment in turn gave results that were consistent with the above contrast:

     estimate 'Treat 1' intercept 1 treatment 1 0 ; returned a logit of 0.9792, SE=3.07

     estimate 'Treat 2' intercept 1 treatment 0 1 ; returned a logit of 11.0872, SE=3.966 ... both of which don't really agree with the actual results (particularly for Treat. 2).

- LSMEANS, however gave very different results:

     Treat 1 estimate: 1.6775 , SE = 0.5715

     Treat 2 estimate: 1.3535 , SE = 0.5774

     using pdiff, the LSMEANS Treatment difference gave: and estimate of 0.324, SE 0.8095, Pr>|t| = 0.79

I would have expected all of these results to all be consistent with one another... so what exactly is LSMEANS doing differently? My suspicion is that LSMEANS automatically accounts for random effects and covariance structure, and builds the L matrix accordingly... and that the simple CONTRAST or ESTIMATE statements don't (I would have to write out things out explicitly for them to be equivalent, which seems like a non-trivial challenge). I think this was somewhat confirmed for me by looking at the L matrix LSMEANS was constructing for the Treatment pdiff comparison, which had many fractional coefficients. Does this mean that to construct CONTRAST or ESTIMATE statements that are equivalent to LSMEANS would require explicitly writing out these complicated coefficients? Perhaps I misunderstand what Type III Fixed effects are supposed to do... but I expected the Fixed Effects result and the LSMEANS result to be consistent... even if my CONTRAST & ESTIMATE statements were over-simplified. Can you shed any light on what is going on?  And on the practical side of things... how should one contend with random effects when constructing estimate and contrast statements, particularly with more complicated random effects models?

cheers,

C.

SteveDenham
Jade | Level 19

Ah, the dangers of the ESTIMATE statement.  Those two estimate statements have 'hidden' parts, related to day_ff.  Since day_ff is a continuous covariate, the estimate statement is looking for a value for day_ff.  Since it doesn't appear in the statement, it plugs in day_ff=0 into the solution vector to get the estimate.  CONTRAST sort of avoids this, as it is really looking at the difference, and the covariate part is probably consistent across levels, so if it isn't included, the difference still looks right.  Personally, I avoid CONTRAST and ESTIMATE in PROC GLIMMIX, and depend on LSMESTIMATE for specific comparisons.  Oh--the LSMEANS statement plugs in the mean value of day_ff when constructing the value, and that is why it is different from your ESTIMATE statement values.

About the consistency of the Fixed Effects and the LSMEANS--you can construct the lsmeans from the L matrix and the solution vector, so they are consistent.  This is where I quote the Wizard of OZ, and say "Pay no attention to that man behind the curtain," and apply that to the tests that show up regarding the components of the solution vector.  Those all depend on the parameterization that you specify, and if it is the default GLM parameterization, a lot of the values are set to zero, so the whole thing is interesting, but really only important for continuous effects.  The class effects do get correctly "collapsed" into type 3 tests in a separate table.

So, LSMEANS automatically accounts for continuous covariates at the mean value.  The standard errors of the estimates and the differences automatically account for the random effects and the covariance structure, and if you have wonderfully balanced data, don't affect the estimates of the means.  But--imbalance changes everything.  The LSMEAN is the 'expected' value, given the data, averaged over stuff such that it represents the population value you would observe if everything were balanced.  And I get more confused when comparing marginal and conditional LSMEANS.  I then turn to Stroup's Generalized Linear Mixed Models for examples that make sense.  Take a look at some of the examples in there, if you can get a copy.

Steve Denham

ColinGradstudent
Calcite | Level 5

Ah, ok. That makes perfect sense, and actually sheds some light on a few previous minor inconsistencies that have been swept under the rug. I will certainly take your advice and make a practice of using LSMESTIMATE for specific comparisons involving class variables. Stroup is quickly becoming my go-to reference for mixed model problems.

cheers,

Colin

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1327 views
  • 4 likes
  • 2 in conversation