BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GreenTree1
Obsidian | Level 7

Hi,

 

I am calculating incidence rate ratio using proc genmod (poisson dist) and to my understanding, it shows underdispersion

 

 Screen Shot 2020-01-27 at 10.24.11 AM.png

But when I run the following to code to assess  goodness of fit, I get a non signficant p value.

Title "goodness of fit_poisson";
data pvalue;
df = 28561; chisq = 12304.4364 ;
pvalue = 1 - probchi(chisq, df);
run;
proc print data = pvalue noobs;
run;


 result as follows

goodness of fit_poisson

 df chisq pvalue

 

Screen Shot 2020-01-27 at 10.45.22 AM.png

 

 

My question is that how do infer these results, I tried running NB and zero-inflated and it gave me a convergence error. Are there any possible solutions to handling underdispersed data?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Your code seems fine. The data does not have to contain multiple observations per ID in order to use the Generalized Estimating Equations method (and get its robust "sandwich" variance estimator) provided by the REPEATED statement. See this note that discusses modeling and estimating rates and rate ratios (but does not make use the REPEATED statement).

 

The QIC is used like the AIC and BIC statements in generalized models. That is, it is used to compare competing models. Used this way, models with smaller QIC values are better. But a QIC value for one model by itself cannot really be used to indicate if that model fits well or not. 

View solution in original post

5 REPLIES 5
StatDave
SAS Super FREQ

Unless there is sufficient number of observations in each of the covariate profiles, the chi-square test is not reliable.

 

Concerning dispersion, see the section on this in this note which offers several suggestions.

 

You might want to try adding the NOLOGNB option in the MODEL statement in PROC GENMOD when fitting the negative binomial model. You might also want to try fitting the negative binomial model via a different procedure such as HPGENSELECT, GLIMMIX, FMM, or NLMIXED since differences in algorithms might allow one of them to succeed. If you have SAS/ETS, COUNTREG can fit the Poisson and negative binomial models, and it also offers the Conway-Maxwell model as another possible approach that can help with underdispersion. In these procedures, there are also options that allow you to tweak the fitting algorithm which might be helpful. But before concluding that there is over- or under-dispersion, you should first consider that the model might not be correctly specified. An alternative model specification might remove any evidence of dispersion problems.

GreenTree1
Obsidian | Level 7

Thank you, StatDave.

 

With regards to number of observations, my sample size is quite large approx. 130,000 (1:3 cases:controls). Thanks for the possible solutions, I will go over them and update this post.

 

GreenTree1
Obsidian | Level 7

Hi,

 

I went over the note and tried specifying the repeated statement in PROC GENMOD as mentioned in the following statement in the note

 

"GEE models for clustered or longitudinal data can be fit by specifying the REPEATED statement in PROC GENMOD and (beginning in SAS 9.4 TS1M2) in PROC GEE........However, a comparative statistic similar to AIC, known as QIC, is provided in PROC GENMOD and PROC GEE....

 

Some background: I am using proc genmod to calculate the incidence rate ratio for count data, and here is how my data looks like

 

ID     events     person-days     exposure   log person-days

1         2               80                   0                  4.38

2         0               11                   1                  2.39

3         11             60                   1                  4.09

4        19              30                   0                  3.40

 

 

 

Following is my code and QIC (attached image) from the output.

 

proc genmod data= mydata descending;
class ID exposure (ref='0');
model events= exposure/offset=log persondays dist=poisson
link=log type3 ;
repeated subject= ID;
run;

 

 

 

Screen Shot 2020-01-29 at 9.32.40 AM.png

 

My questions are: 

 

1. Being a new SAS user, I am not sure if I have specified the repeated statement correctly?

 

2. In order to use the repeated statement, does my data need to in repeated ID format, where there are multiple observations by each ID? My current data has a number of events and total person-days contributed collapsed for each ID,  does the repeated statement apply to this, since the IDs are not repeated?

 

3. Looking at the QIC, I think the model is a good fit, but I am not too sure.

 

Please let me know if more clarification is needed.

 

 

StatDave
SAS Super FREQ

Your code seems fine. The data does not have to contain multiple observations per ID in order to use the Generalized Estimating Equations method (and get its robust "sandwich" variance estimator) provided by the REPEATED statement. See this note that discusses modeling and estimating rates and rate ratios (but does not make use the REPEATED statement).

 

The QIC is used like the AIC and BIC statements in generalized models. That is, it is used to compare competing models. Used this way, models with smaller QIC values are better. But a QIC value for one model by itself cannot really be used to indicate if that model fits well or not. 

GreenTree1
Obsidian | Level 7

Makes perfect sense, that it is used to compare competing models. I am glad that you mentioned, I kept wondering about the reference for "smaller is better". 

 

I will rerun my analysis and see how it goes

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1040 views
  • 1 like
  • 2 in conversation