Solved: Proc genmod for IR, underdispersion

GreenTree1 · Posted 01-27-2020 11:46 AM

Hi,

I am calculating incidence rate ratio using proc genmod (poisson dist) and to my understanding, it shows underdispersion

But when I run the following to code to assess goodness of fit, I get a non signficant p value.

Title "goodness of fit_poisson";
data pvalue;
  df = 28561; chisq = 12304.4364 ;
  pvalue = 1 - probchi(chisq, df);
run;
proc print data = pvalue noobs;
run;

result as follows

goodness of fit_poisson

df chisq pvalue

My question is that how do infer these results, I tried running NB and zero-inflated and it gave me a convergence error. Are there any possible solutions to handling underdispersed data?

Thank you

StatDave · Posted 01-29-2020 03:05 PM

Your code seems fine. The data does not have to contain multiple observations per ID in order to use the Generalized Estimating Equations method (and get its robust "sandwich" variance estimator) provided by the REPEATED statement. See this note that discusses modeling and estimating rates and rate ratios (but does not make use the REPEATED statement).

The QIC is used like the AIC and BIC statements in generalized models. That is, it is used to compare competing models. Used this way, models with smaller QIC values are better. But a QIC value for one model by itself cannot really be used to indicate if that model fits well or not.

View solution in original post

StatDave · Posted 01-27-2020 02:54 PM

Unless there is sufficient number of observations in each of the covariate profiles, the chi-square test is not reliable.

Concerning dispersion, see the section on this in this note which offers several suggestions.

You might want to try adding the NOLOGNB option in the MODEL statement in PROC GENMOD when fitting the negative binomial model. You might also want to try fitting the negative binomial model via a different procedure such as HPGENSELECT, GLIMMIX, FMM, or NLMIXED since differences in algorithms might allow one of them to succeed. If you have SAS/ETS, COUNTREG can fit the Poisson and negative binomial models, and it also offers the Conway-Maxwell model as another possible approach that can help with underdispersion. In these procedures, there are also options that allow you to tweak the fitting algorithm which might be helpful. But before concluding that there is over- or under-dispersion, you should first consider that the model might not be correctly specified. An alternative model specification might remove any evidence of dispersion problems.

GreenTree1 · Posted 01-27-2020 03:01 PM

Thank you, StatDave.

With regards to number of observations, my sample size is quite large approx. 130,000 (1:3 cases:controls). Thanks for the possible solutions, I will go over them and update this post.

GreenTree1 · Posted 01-29-2020 10:40 AM

Hi,

I went over the note and tried specifying the repeated statement in PROC GENMOD as mentioned in the following statement in the note

"GEE models for clustered or longitudinal data can be fit by specifying the REPEATED statement in PROC GENMOD and (beginning in SAS 9.4 TS1M2) in PROC GEE........However, a comparative statistic similar to AIC, known as QIC, is provided in PROC GENMOD and PROC GEE....

Some background: I am using proc genmod to calculate the incidence rate ratio for count data, and here is how my data looks like

ID events person-days exposure log person-days

1 2 80 0 4.38

2 0 11 1 2.39

3 11 60 1 4.09

4 19 30 0 3.40

Following is my code and QIC (attached image) from the output.

proc genmod data= mydata descending;
class ID exposure (ref='0');
model events= exposure/offset=log persondays dist=poisson
link=log type3 ;
repeated subject= ID;
run;

My questions are:

1. Being a new SAS user, I am not sure if I have specified the repeated statement correctly?

2. In order to use the repeated statement, does my data need to in repeated ID format, where there are multiple observations by each ID? My current data has a number of events and total person-days contributed collapsed for each ID, does the repeated statement apply to this, since the IDs are not repeated?

3. Looking at the QIC, I think the model is a good fit, but I am not too sure.

Please let me know if more clarification is needed.

StatDave · Posted 01-29-2020 03:05 PM

Your code seems fine. The data does not have to contain multiple observations per ID in order to use the Generalized Estimating Equations method (and get its robust "sandwich" variance estimator) provided by the REPEATED statement. See this note that discusses modeling and estimating rates and rate ratios (but does not make use the REPEATED statement).

The QIC is used like the AIC and BIC statements in generalized models. That is, it is used to compare competing models. Used this way, models with smaller QIC values are better. But a QIC value for one model by itself cannot really be used to indicate if that model fits well or not.

GreenTree1 · Posted 01-30-2020 11:19 AM

Makes perfect sense, that it is used to compare competing models. I am glad that you mentioned, I kept wondering about the reference for "smaller is better".

I will rerun my analysis and see how it goes

Proc genmod for IR, underdispersion

Re: Proc genmod for IR, underdispersion

Re: Proc genmod for IR, underdispersion

Re: Proc genmod for IR, underdispersion

Re: Proc genmod for IR, underdispersion

Re: Proc genmod for IR, underdispersion

Re: Proc genmod for IR, underdispersion

Catch up on SAS Innovate 2026