01-09-2013 10:15 PM
I would like to seek help in analysis of my data set. I am looking at the effect of storage conditions (humidity and temperature) on germination of dormant seeds (over time). I set up my experiment as a split-split plot (main plot: humidity, subplot: temp and subsubplot: time) and I did two runs (seasons) to see whether treatment results are consistent. Results of the PROC univariate indicates that my data is not normal and is highly positively skewed (2.2). Can I still run an ANOVA with this? I would like to see whether factors I have (and their interactions are significant) and also whether the two runs are significant (which can indicate whether the runs can be combined or not. Attached is a data set and results of the proc univariate. Any help (specifically in writing the analysis code) is greatly appreciated.
01-10-2013 08:43 AM
Recall that the assumptions on ANOVA are that the residuals be relatively normally distributed, not necessarily the response variable. It really, really looks like your response variable is a count, and it really looks like it is zero-inflated (the median is zero). I suggest looking at the documentation for PROC GENMOD, especially for the zero-inflated models. The hard part will be correctly specifying a split-split plot, which is relatively easy to do in MIXED with RANDOM and REPEATED, or GLIMMIX with residual option in the RANDOM statement. However, neither of those will accurately fit a zero inflated model.
Check the SAS-L archives for many, many threads on fitting zero-inflated models.
01-10-2013 01:20 PM
Thanks for this!
Re: Data. Yes, the response variable is a count, specifically percent germination (i.e. number of germinated seeds/50 seeds).
I'll check the archives and see what I'll find. Thanks again for your help.
01-10-2013 03:35 PM
Umm. That would be a proportion, bounded below by zero and above by 1. Neither the Poisson nor negative binomial really is applicable, because you have a maximum count 50 (out of 50) that would show up in your data as 100. So that means no "zero inflation."
I thought about this a little bit, and really wanted to use a binomial distribution, but it has convergence problems. So, since you do not have any 100% obs, I did the add a trivial bit to all values, and used a beta distribution.
proc glimmix data=germ2 method=rspl abspconv=1e-8;
class humidity temp rep season month;
nloptions tech=quanew maxiter=2000 ;
random month/residual subject=humidity*temp*season*rep type=sp(pow)(monthx);
lsmeans humidity|temp|season|month/cl ilink;
This ran for me. The point estimates obtained with the ilink option should be adjusted for the 0.0001 added.
I have some ideas about how to approach the zero inflation idea, using a fixed offset, but that is for a later post.