Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc genmod for IR, underdispersion

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 01-27-2020 11:46 AM
(1063 views)

Hi,

I am calculating incidence rate ratio using proc genmod (poisson dist) and to my understanding, it shows underdispersion

But when I run the following to code to assess goodness of fit, I get a non signficant p value.

Title "goodness of fit_poisson";

data pvalue;

df = 28561; chisq = 12304.4364 ;

pvalue = 1 - probchi(chisq, df);

run;

proc print data = pvalue noobs;

run;

result as follows

goodness of fit_poisson

df chisq pvalue

My question is that how do infer these results, I tried running NB and zero-inflated and it gave me a convergence error. Are there any possible solutions to handling underdispersed data?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your code seems fine. The data does not have to contain multiple observations per ID in order to use the Generalized Estimating Equations method (and get its robust "sandwich" variance estimator) provided by the REPEATED statement. See this note that discusses modeling and estimating rates and rate ratios (but does not make use the REPEATED statement).

The QIC is used like the AIC and BIC statements in generalized models. That is, it is used to compare competing models. Used this way, models with smaller QIC values are better. But a QIC value for one model by itself cannot really be used to indicate if that model fits well or not.

- Tags:
- rate ratios
- rates

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Unless there is sufficient number of observations in each of the covariate profiles, the chi-square test is not reliable.

Concerning dispersion, see the section on this in this note which offers several suggestions.

You might want to try adding the NOLOGNB option in the MODEL statement in PROC GENMOD when fitting the negative binomial model. You might also want to try fitting the negative binomial model via a different procedure such as HPGENSELECT, GLIMMIX, FMM, or NLMIXED since differences in algorithms might allow one of them to succeed. If you have SAS/ETS, COUNTREG can fit the Poisson and negative binomial models, and it also offers the Conway-Maxwell model as another possible approach that can help with underdispersion. In these procedures, there are also options that allow you to tweak the fitting algorithm which might be helpful. But before concluding that there is over- or under-dispersion, you should first consider that the model might not be correctly specified. An alternative model specification might remove any evidence of dispersion problems.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you, StatDave.

With regards to number of observations, my sample size is quite large approx. 130,000 (1:3 cases:controls). Thanks for the possible solutions, I will go over them and update this post.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I went over the note and tried specifying the repeated statement in PROC GENMOD as mentioned in the following statement in the note

"GEE models for clustered or longitudinal data can be fit by specifying the REPEATED statement in PROC GENMOD and (beginning in SAS 9.4 TS1M2) in PROC GEE........However, a comparative statistic similar to AIC, known as QIC, is provided in PROC GENMOD and PROC GEE....

Some background: I am using proc genmod to calculate the incidence rate ratio for count data, and here is how my data looks like

ID events person-days exposure log person-days

1 2 80 0 4.38

2 0 11 1 2.39

3 11 60 1 4.09

4 19 30 0 3.40

Following is my code and QIC (attached image) from the output.

proc genmod data= mydata descending;

class ID exposure (ref='0');

model events= exposure/offset=log persondays dist=poisson

link=log type3 ;

repeated subject= ID;

run;

My questions are:

1. Being a new SAS user, I am not sure if I have specified the repeated statement correctly?

2. In order to use the repeated statement, does my data need to in repeated ID format, where there are multiple observations by each ID? My current data has a number of events and total person-days contributed collapsed for each ID, does the repeated statement apply to this, since the IDs are not repeated?

3. Looking at the QIC, I think the model is a good fit, but I am not too sure.

Please let me know if more clarification is needed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your code seems fine. The data does not have to contain multiple observations per ID in order to use the Generalized Estimating Equations method (and get its robust "sandwich" variance estimator) provided by the REPEATED statement. See this note that discusses modeling and estimating rates and rate ratios (but does not make use the REPEATED statement).

The QIC is used like the AIC and BIC statements in generalized models. That is, it is used to compare competing models. Used this way, models with smaller QIC values are better. But a QIC value for one model by itself cannot really be used to indicate if that model fits well or not.

- Tags:
- rate ratios
- rates

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Makes perfect sense, that it is used to compare competing models. I am glad that you mentioned, I kept wondering about the reference for "smaller is better".

I will rerun my analysis and see how it goes

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.