New Contributor
Posts: 2

# Genmod - impact of data distribution

Hello everybody and SAS-friends,

I would like to ask you about aspects of different data distribution using proc genmod. I‘m analyzing rate data (rate=counts/exposure). Because of overdispersion, I didn't used Poisson distribution but the negative binomial. I supposed that this kind of distribution changes my regression formula. If I used Poisson distribution it would be log(rate)=...  so I would use counts and add offset=log(exposure) just like in Gettting Startedoisson Distribution. Do I have to change the offset when I use negative binomial distribution? I have only found in articles a regression formula for rates with binomial distribution: log(rate)/(1 - log(rate))=... so after mathematical treatment the offset for Counts variable would be log(Exposure - Counts). But what about the offset for negative binomial distribution? Is it the same like for binomial distribution?

Do you know where I could find the answer?

Thank you very much for any reply.

Annanomi.

Super User
Posts: 10,213

## Re: Genmod - impact of data distribution

My guess:

Since the negative binomial distribution's available value x=0,1,2,...........  like Possion distribution , so you can use offset= too as it is used in Possion Regression .

Xia Keshan

Super Contributor
Posts: 301

## Re: Genmod - impact of data distribution

I agree that offset can be used in negative binomial regression as it is used in Poisson or any other regression model.

But, I do not neccessarily agree that negative binomial regression is a good model. I you estimate rate from survival data, and assuming piecewise constant hazard rates, then the likelihood function is the same as if you had observed poisson distributed Counts. This is not the same as assuming poisson distributed Counts. By using overdispersion as argument for chaning to negative binomial distrubution is a use of a distribution assumption that was not needed to be true. And further, the negative binomail likelihood can not be derived from a likelihood based on the time-to-events. Actually,  if you simulate data from a exponential distribution (that Means a constant hazard rate), then it is not ulikely that you will observe under- or over dispersion on aggregated count-data, even though all assumptions is truly satisfied.

So, to conclude, be very carefully when you change your model from poisson to negative binomial, the confidence intervals will become nonsense in the sense that you can determine them yourself by changing how much you aggregate your data. For example, if you aggregate your data on one additional binary covariate, then each cell is splitted into two. This will have a dramatic effect on confidence intervals when the negative binomial regression is used even though the covariate is not used in the model. The poisson regression will give unchanged estimates. this is because the negative binomial regression use as variance the observed variance. and this is not meaningfull if the original data was time-to-event (which was aggregated to Count data).

Discussion stats
• 2 replies
• 284 views
• 1 like
• 3 in conversation