BookmarkSubscribeRSS Feed
OB_HNL
Calcite | Level 5

Hi all,

I would like to seek help in analysis of my data set. I am looking at the effect of storage conditions (humidity and temperature) on germination of dormant seeds (over time). I set up my experiment as a split-split plot (main plot: humidity, subplot: temp and subsubplot: time) and I did two runs (seasons) to see whether treatment results are consistent. Results of the PROC univariate indicates that my data is not normal and is highly positively skewed (2.2). Can I still run an ANOVA with this? I would like to see whether factors I have (and their interactions are significant) and also whether the two runs are significant (which can indicate whether the runs can be combined or not. Attached is a data set and results of the proc univariate. Any help (specifically in writing the analysis code) is greatly appreciated.

Thanks! 

3 REPLIES 3
SteveDenham
Jade | Level 19

Recall that the assumptions on ANOVA are that the residuals be relatively normally distributed, not necessarily the response variable.  It really, really looks like your response variable is a count, and it really looks like it is zero-inflated (the median is zero).  I suggest looking at the documentation for PROC GENMOD, especially for the zero-inflated models.  The hard part will be correctly specifying a split-split plot, which is relatively easy to do in MIXED with RANDOM and REPEATED, or GLIMMIX with residual option in the RANDOM statement.  However, neither of those will accurately fit a zero inflated model.

Check the SAS-L archives for many, many threads on fitting zero-inflated models.

Steve Denham

OB_HNL
Calcite | Level 5

Thanks for this!

Re: Data. Yes, the response variable is a count, specifically percent germination (i.e. number of germinated seeds/50 seeds).

I'll check the archives and see what I'll find. Thanks again for your help.

Orville

SteveDenham
Jade | Level 19

Umm.  That would be a proportion, bounded below by zero and above by 1.  Neither the Poisson nor negative binomial really is applicable, because you have a maximum count 50 (out of 50) that would show up in your data as 100.  So that means no "zero inflation."

I thought about this a little bit, and really wanted to use a binomial distribution, but it has convergence problems.  So, since you do not have any 100% obs, I did the add a trivial bit to all values, and used a beta distribution.

data germ2;
set germ_combined;
value=(germ/100);
value2=value+0.0001;
run;

proc glimmix data=germ2 method=rspl abspconv=1e-8;
monthx=month;
class humidity temp rep season month;
nloptions tech=quanew maxiter=2000 ;
model value2=humidity|temp|season|month/dist=beta;
random month/residual subject=humidity*temp*season*rep type=sp(pow)(monthx);
lsmeans humidity|temp|season|month/cl ilink;
run;

This ran for me.  The point estimates obtained with the ilink option should be adjusted for the 0.0001 added.

I have some ideas about how to approach the zero inflation idea, using a fixed offset, but that is for a later post.

Steve Denham

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1747 views
  • 0 likes
  • 2 in conversation