BookmarkSubscribeRSS Feed
OB_HNL
Calcite | Level 5

Hi all,

I would like to seek help in analysis of my data set. I am looking at the effect of storage conditions (humidity and temperature) on germination of dormant seeds (over time). I set up my experiment as a split-split plot (main plot: humidity, subplot: temp and subsubplot: time) and I did two runs (seasons) to see whether treatment results are consistent. Results of the PROC univariate indicates that my data is not normal and is highly positively skewed (2.2). Can I still run an ANOVA with this? I would like to see whether factors I have (and their interactions are significant) and also whether the two runs are significant (which can indicate whether the runs can be combined or not. Attached is a data set and results of the proc univariate. Any help (specifically in writing the analysis code) is greatly appreciated.

Thanks! 

3 REPLIES 3
SteveDenham
Jade | Level 19

Recall that the assumptions on ANOVA are that the residuals be relatively normally distributed, not necessarily the response variable.  It really, really looks like your response variable is a count, and it really looks like it is zero-inflated (the median is zero).  I suggest looking at the documentation for PROC GENMOD, especially for the zero-inflated models.  The hard part will be correctly specifying a split-split plot, which is relatively easy to do in MIXED with RANDOM and REPEATED, or GLIMMIX with residual option in the RANDOM statement.  However, neither of those will accurately fit a zero inflated model.

Check the SAS-L archives for many, many threads on fitting zero-inflated models.

Steve Denham

OB_HNL
Calcite | Level 5

Thanks for this!

Re: Data. Yes, the response variable is a count, specifically percent germination (i.e. number of germinated seeds/50 seeds).

I'll check the archives and see what I'll find. Thanks again for your help.

Orville

SteveDenham
Jade | Level 19

Umm.  That would be a proportion, bounded below by zero and above by 1.  Neither the Poisson nor negative binomial really is applicable, because you have a maximum count 50 (out of 50) that would show up in your data as 100.  So that means no "zero inflation."

I thought about this a little bit, and really wanted to use a binomial distribution, but it has convergence problems.  So, since you do not have any 100% obs, I did the add a trivial bit to all values, and used a beta distribution.

data germ2;
set germ_combined;
value=(germ/100);
value2=value+0.0001;
run;

proc glimmix data=germ2 method=rspl abspconv=1e-8;
monthx=month;
class humidity temp rep season month;
nloptions tech=quanew maxiter=2000 ;
model value2=humidity|temp|season|month/dist=beta;
random month/residual subject=humidity*temp*season*rep type=sp(pow)(monthx);
lsmeans humidity|temp|season|month/cl ilink;
run;

This ran for me.  The point estimates obtained with the ilink option should be adjusted for the 0.0001 added.

I have some ideas about how to approach the zero inflation idea, using a fixed offset, but that is for a later post.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 788 views
  • 0 likes
  • 2 in conversation