07-27-2015 03:33 AM
I have longitudinal data for three groups of subjects. The data are count data ranging from zero to millions. I have zero-inflated data and complete separation. Certain groups have values of exact zero from a particalr timepoint onwards. The data looks better when log transformed, but then I have zero-inflated continuous data that I could fit with the tweedie distribution , but this is however not implemented for longitudinal data (neither are the zip or zinb models). The firth correction is also not available in a longitudinal setting. Can somebody give me some advice where to begin?
Thanks for any suggestion,
07-27-2015 08:26 AM
Can the longitudinal data be separated into "phases" of some kind, i.e., is there a natural reason for the counts to go to zero? If so, then split those off, and note that the groups differ due to whatever that reason is--no p value needed.
A second possibility is to fit a heteroscedastic model, assuming a normal distribution for the errors. The p values may be off some, but if the differences are striking, they should still be detectable. The write up would have to acknowledge that the assumptions may not be appropriate.
A third possibility is to consider each time point as a binomial response--either a count could be made or not. Although I generally dislike dichotomizing data, this opens a couple of venues--a survival analysis time-to-repeated no count with right censoring, or a repeated measures binomial.
And number four (not my choice but it could be done). Analyze each time point separately, collect all the p values and apply some sort of false discovery rate adjustment. This totally misses the correlation structure but has been done for microarray data, where the measures aren't longitudinal but are certainly correlated.
Good luck with this.
07-27-2015 08:53 AM
I think I'll stick to the first option, that is what I did so far but I wasn't sure whether that would be ok enough. The counts drop to zero due to a particular treatment. I converted the zero values to 1 and I fitted a GEE with a lognormal distribution. With this, it is feasible to do simulations as well.
07-28-2015 08:17 AM
lvm has posted some really good things regarding adding a constant to values, and then using lognormal and gamma distributions, and what might go wrong. I know I have been doing that for quite a lot of things. However, PROC GENMOD does provide zero-inflated techniques for Poisson and negative binomial distributions--I just don't know if it will accept GEE (REPEATED) syntax.
07-27-2015 01:08 PM
This note on zero-inflated models includes showing how to fit such models using PROC NLMIXED. If you fit the model in that procedure, you could include a RANDOM statement to deal with longitudinal data.