BookmarkSubscribeRSS Feed
vstorme
Obsidian | Level 7

Hi,

I have longitudinal data for three groups of subjects. The data are count data ranging from zero to millions. I have zero-inflated data and complete separation. Certain groups have values of exact zero from a particalr timepoint onwards. The data looks better when log transformed, but then I have zero-inflated continuous data that I could fit with the tweedie distribution , but this is however not implemented for longitudinal data (neither are the zip or zinb models). The firth correction is also not available in a longitudinal setting. Can somebody give me some advice where to begin?

Thanks for any suggestion,

Veron

4 REPLIES 4
SteveDenham
Jade | Level 19

Can the longitudinal data be separated into "phases" of some kind, i.e., is there a natural reason for the counts to go to zero?  If so, then split those off, and note that the groups differ due to whatever that reason is--no p value needed.

A second possibility is to fit a heteroscedastic model, assuming a normal distribution for the errors.  The p values may be off some, but if the differences are striking, they should still be detectable.  The write up would have to acknowledge that the assumptions may not be appropriate.

A third possibility is to consider each time point as a binomial response--either a count could be made or not.  Although I generally dislike dichotomizing data, this opens a couple of venues--a survival analysis time-to-repeated no count with right censoring, or a repeated measures binomial.

And number four (not my choice but it could be done).  Analyze each time point separately, collect all the p values and apply some sort of false discovery rate adjustment.  This totally misses the correlation structure but has been done for microarray data, where the measures aren't longitudinal but are certainly correlated.

Good luck with this.

Steve Denham

vstorme
Obsidian | Level 7

Thanks Steve,

I think I'll stick to the first option, that is what I did so far but I wasn't sure whether that would be ok enough. The counts drop to zero due to a particular treatment. I converted the zero values to 1 and I fitted a GEE with a lognormal distribution. With this, it is feasible to do simulations as well.

SteveDenham
Jade | Level 19

lvm has posted some really good things regarding adding a constant to values, and then using lognormal and gamma distributions, and what might go wrong.  I know I have been doing that for quite a lot of things.  However, PROC GENMOD does provide zero-inflated techniques for Poisson and negative binomial distributions--I just don't know if it will accept GEE (REPEATED) syntax.

Steve Denham

StatDave
SAS Super FREQ

This note on zero-inflated models includes showing how to fit such models using PROC NLMIXED.  If you fit the model in that procedure, you could include a RANDOM statement to deal with longitudinal data.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2440 views
  • 3 likes
  • 3 in conversation