Re: Need help finding the right SAS procedure for non-normal seed germ...

OB_HNL · Posted 01-09-2013 10:15 PM

Hi all,

I would like to seek help in analysis of my data set. I am looking at the effect of storage conditions (humidity and temperature) on germination of dormant seeds (over time). I set up my experiment as a split-split plot (main plot: humidity, subplot: temp and subsubplot: time) and I did two runs (seasons) to see whether treatment results are consistent. Results of the PROC univariate indicates that my data is not normal and is highly positively skewed (2.2). Can I still run an ANOVA with this? I would like to see whether factors I have (and their interactions are significant) and also whether the two runs are significant (which can indicate whether the runs can be combined or not. Attached is a data set and results of the proc univariate. Any help (specifically in writing the analysis code) is greatly appreciated.

Thanks!

SteveDenham · Posted 01-10-2013 08:43 AM

Recall that the assumptions on ANOVA are that the residuals be relatively normally distributed, not necessarily the response variable. It really, really looks like your response variable is a count, and it really looks like it is zero-inflated (the median is zero). I suggest looking at the documentation for PROC GENMOD, especially for the zero-inflated models. The hard part will be correctly specifying a split-split plot, which is relatively easy to do in MIXED with RANDOM and REPEATED, or GLIMMIX with residual option in the RANDOM statement. However, neither of those will accurately fit a zero inflated model.

Check the SAS-L archives for many, many threads on fitting zero-inflated models.

Steve Denham

OB_HNL · Posted 01-10-2013 01:20 PM

Thanks for this!

Re: Data. Yes, the response variable is a count, specifically percent germination (i.e. number of germinated seeds/50 seeds).

I'll check the archives and see what I'll find. Thanks again for your help.

Orville

SteveDenham · Posted 01-10-2013 03:35 PM

Umm. That would be a proportion, bounded below by zero and above by 1. Neither the Poisson nor negative binomial really is applicable, because you have a maximum count 50 (out of 50) that would show up in your data as 100. So that means no "zero inflation."

I thought about this a little bit, and really wanted to use a binomial distribution, but it has convergence problems. So, since you do not have any 100% obs, I did the add a trivial bit to all values, and used a beta distribution.

data germ2;
set germ_combined;
value=(germ/100);
value2=value+0.0001;
run;

proc glimmix data=germ2 method=rspl abspconv=1e-8;
monthx=month;
class humidity temp rep season month;
nloptions tech=quanew maxiter=2000 ;
model value2=humidity|temp|season|month/dist=beta;
random month/residual subject=humidity*temp*season*rep type=sp(pow)(monthx);
lsmeans humidity|temp|season|month/cl ilink;
run;

This ran for me. The point estimates obtained with the ilink option should be adjusted for the 0.0001 added.

I have some ideas about how to approach the zero inflation idea, using a fixed offset, but that is for a later post.

Steve Denham

Need help finding the right SAS procedure for non-normal seed germination data

Re: Need help finding the right SAS procedure for non-normal seed germination data

Re: Need help finding the right SAS procedure for non-normal seed germination data

Re: Need help finding the right SAS procedure for non-normal seed germination data

Need help finding the right SAS procedure for non-normal seed germination data

Re: Need help finding the right SAS procedure for non-normal seed germination data

Re: Need help finding the right SAS procedure for non-normal seed germination data

Re: Need help finding the right SAS procedure for non-normal seed germination data

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!