BookmarkSubscribeRSS Feed
skr01
Calcite | Level 5

Hello

I am modeling a competition assay, where two competitors are inoculated in a “+” formation and I evaluate whether the vertical genotype wins as a binary response variable. I have two fixed main predictors each for the vertical and horizontal genotype (Envir and Protein, and each has 2 levels), so for each competition (“trial”) there is a VertEnv, HorizEnv, VertProtein, HorizProtein.) I have genotypes, which come from different environments, as random factors, and replication at the level of the Vert_genotype*Horiz_genotype combination. There is almost complete consistency among the three replicates for each such combination. This is an all-by-all assay, so the Vert_genotypes and Horiz_genotypes appear multiple times in different combinations. There is imbalance in both fixed predictors such that for a few levels of the 4-way fixed effect interaction there is only one combination of genotypes yielding a particular outcome but there are instances of all possible combinations of the fixed effects yielding at least a few trials with both outcomes. I am interested in a number of 2-way interactions among the fixed factors but fear there may be important 3-way or that the 4-way is important).

 

At this point my model is:

 

Proc glimmix data=xxx method=laplace; Class Horiz_geno Vert_geno Venv Henv VProt HProt;

Model Win_V (event=’1’) = Venv|Henv|VProt|HProt

                                    / ddfm=bw link=logit dist=binary;

random Vert_genotype(Venv) Horiz_genotype(Henv) Vert_genotype*Horiz_genotype(Venv*Henv);

lsmeans Venv*Henv*VProt*HProt / pdiff OR adjust=tukey ilink CL;

nloptions tech=nrridg maxiter=500;

run;

 

*(LaPlace based on Bolker et al 2008, Trends in Ecol and Evol, among others);

*(bw based on various message boards, where posts suggest bw ddfm for LaPlace with binary)

 

My questions are;

  • Comparison of this model to a model using the default RSPL method with ddfm=kr yields big differences both in terms of outcomes of significance testing and in terms of Lsmeans and CIs. The LaPlace/bw version has more significant tests of fixed effects with CIs that are just enormous (e.g. OR CIs from almost 0 to almost 1000). The RSPL/kr model yields no significant fixed effect tests and much more reasonable looking CIs. I assume this comparison is because I am changing two things at the same time, but would love to better understand how so. More practically, which, if either model is correct (or more so), and/or should I be using a different approach?
  • Is this a reasonable way to approach the random terms?

 

Thanks for your insights!

2 REPLIES 2
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Be very careful with Laplace or Quadrature with binary or binomial data. If any of the random effects are 0, then everything blows up (without any warnings). Many of the standard errors of LSMEANS are 0 or go to zero, which means you get meaningless results. The Type III test could be extremely inlfated. You must remove any random effects with a 0 variance to get sensible results. THis is not an issue with RSPL estimation. 

 

Early work suggested that RSPL could give biased results with binary response, so many have suggested Laplace or quadrature. However, recent work by Stroup shows that this is not always the case. My view, based on the most recent evidence, that there is no clear-cut best estimation method. But I do lean towards Laplace for purely binary repsonses. Just make sure you don't have any zero variances.

skr01
Calcite | Level 5

Thanks so much for the warning! I actually have a sort of "sister model" with a simpler layout with which I was having similar symptoms, and it did indeed have random variables (factorial design) with zero variances, and removing them did make it look much more reasonable. But that wasn't the problem with the one I posted about. Your suggestion got me thinking again about my random statements though, and I wondered whether the fact that for a particular vertical genotype-horizontal genotype combination the three replicates were very frequenty identical in outcome, so I make consensus calls for each genotype-combination and removed that interaction from the model, and the confidence intervals look more reasonable too. Does this make sense?

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1451 views
  • 3 likes
  • 2 in conversation