Hi,
I have a data set and do not have missing data or small cell counts in this case (not for the overall data set, but when broken down by age then it gets into smaller numbers in the cells or numerous zeros for which I have other Hessian type errors (perhaps for a separate question on this board)). I sorted out how to sum up n and the population for the various age groups and genders (categorized by 3 different areas over a certain time period), but now I need to know the what and why of Proc Genmod and why it returns certain results. I was told the data may not be entered correctly because the goodness of fit test returns zeros in the first half and then for the LR stats I am getting p=1.000. For this data set, I am interested in seeing if the interaction between area and period is a significant predictor of n, not so much area or period on their own.
I created a mock data set (attached) that returns similar output (zeros in goodness of fit, p=1.000) and the code that I used is:
PROC IMPORT OUT= WORK.EXAMPLE
DATAFILE= "C:\example.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;
data Ex;
set Example;
logpop=log(pop);
run;
proc genmod data=Ex;
class period area;
model n= area period area*period /dist=NB link=log offset=logpop type1;
make 'ParameterEstimates' out=parm;
run;
Any help is appreciated. Thank you!
For the data you attach it is correct that the p-value for goodness of fit should be one. That is because the model is saturated.
A saturated model means that the model is so flexible that the observations can be fitted with the observations (rather whan a calculated predicted value).
The deviance is the difference in the likelihood value from going from the specified model to the saturated model, and since your model already is satured this difference will be zero, and thereby the p-value=1.
Thanks for your answer. So is there anything I can do in this case? Or do I report anything in a certain way (or is it fairly useless)?
I dont think it is meaning full to use a negative binomial distribution while testing interaction for an interactioneffect in a saturated model. That is because the negative binomial distribution has a dispersion parameter which can not be estimated in a saturated model (for nonsaturated model it should be no problem).
If you instead fit the data with a poisson distribution instead, you can just test interaction with type3 test, like this:
proc genmod data=example;
class period area;
model n= area period area*period /dist=poisson link=log offset=logpop type3 ;
run;
Thanks again for your help. I ran the poisson model and used the type3 test as suggested. I now have a p-value for the interaction (rather than p=1.000), but I still get zeros in the goodness of fit output - is this still going to be the case based on what you described earlier because it is a saturated model? I'm thinking that it is.
In deciding on negative binomial or poisson, I read that the scaled deviance value should be examined to determine model accuracy (value close to 1). So in this case, it is zero, but I am using poisson because the model is saturated. For the other models that I run, should I examine the scaled deviance value and then switch from poisson to NB if scaled deviance is far off from 1? Also, how am I able to tell if the model is saturated based on the output I get (thinking ahead to other models where I know I had p=1.00)?
SAS wrote if the model is satured, just look in the log.
It should write something like "fitting saturated model". Whether you use Poison or Negative binomail on a saturaed model doesn't matter. Both Poisson and NB will give the perfect fit in a saturated model, therefore p-value for goodness of with will both be 1.0.
I am not sure the type 3 test get what you want when you use the negative binomal distribution. That is because I think it doesnt test from the "best fit in one model" to the "best fit in a reduced model". Rather I think it test to a reduced model, but fixing the dispersion at the same level. But I may be wrong here, as it is not my field of expertise.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.