turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Overdispersion in Poisson Model

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2017 05:18 PM

I'm running the code below and I get a deviance/df value of 1.4. Is that considered to be serious enough that I need to correct for overdispersion?

**proc** **genmod** data = icu.final_exposure;

class exposure;

model mh = exposure/ dist = poisson link = log offest = lnpt;

estimate 'logrr' exposure **1** / exp;

lsmeans exposure / ilink cl;

**run**;

Thanks,

Brian

Accepted Solutions

Solution

02-01-2017
10:46 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2017 10:46 AM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2017 08:42 AM

I would vote no. I don't think you really have overdispersion until you start getting double digit values for deviance/df. However, you may want to investigate a negative binomial distribution, just in case.

Steve Denham

Solution

02-01-2017
10:46 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-01-2017 10:46 AM

Steve,

Thanks again for your valuable insight!

Brian

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-03-2017 04:11 AM

I will also vote not, but for other reason that what @SteveDenham mention.

You can completely ignore overdispersion in such Poisson regression model. The reason is that the data doesn't need to be Poisson distributed. Actually, the data which is behind your number of events is time-to-event data. If the assumption of piecewise constant rates are fullfilled, then data can be analyzed by poisson regression because the likelihood function in the Poisson regression is exactly the likelihood function you want to maximize if you had the original time-to-event data. Therefore, it is wrong to use Poisson regression in such model to validate the distribution of data, it is only a trick to maximize the likelihood function and thereby make estimates and relevant hyphotesis testing about covariates.

It is actually quite easy to verify: simulate n datapoints from exponential distribution then cumulate the values. you can now estimate the rate using poisson regression (model n=/dist=poisson link=log offset=logcumtime). In such model it is obvious that it is meaning less to talk about overdispersion even that the dispersion index will be showed. So just forget about dispersion in Poisson regression.

If the data was truly count-data, then it is much more relevant to look on the assumption of poisondistributed data, and then the dispersion index is much more relevant.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-03-2017 09:24 AM

In this example I illustrate my point my simulate data from exponential distributions. Using the similary of likelihood functions I can estimate the rate by Poisson regression. Note that I dont fit random observations as what I have on left side of the model is the number of observations, Therefore it is meaningless to talk about Poisson *distributed* observations and so it will not make sense to verify if data is Poisson distributed.

```
data silly_data;
do group=1 to 2;
do i=1 to 1000;
event=1;
time=rand('exponential',4);
logtime=log(time);
output;
end;
end;
run;
proc summary data=silly_data nway;
var time;
class group;
output out=summary sum=sumtime;
ruN;
data summary;
set summary;
logtime=log(sumtime);
run;
*estimate the rate paramater using aggregated form of the data;
proc genmod data=summary;
model _freq_=/dist=poisson link=log dist=poisson offset=logtime;
estimate 'rate' intercept 1;
run;
*same estimate can be obtained by using unaggregated form of the data;
proc genmod data=silly_data;
model event=/dist=poisson link=log dist=poisson offset=logtime;
estimate 'rate' intercept 1;
run;
```