I have count data for each NJ county and for each year (2007-2013). I am trying to run a genmod procedure with offset for poisson regression. Using the offset as the log of the population. The algorithm converges but I end up with 2 reference groups for county (Warren & Sussex). The attached contains the data and the model output. I considered the possibility that a repeated statement might be necessary but not sure the data supports it's use. Any assistance would be appreciated.
Using SAS 9.3
proc genmod data=model2;
class county year;
model countsd=nonwhite county year/dist=poisson link=log offset=logpop type3;
run;
In the Parameter Estimates table, County = "Union" may look like a reference level but it isn't. It is removed from the independent factors becaure there is a linear dependency between Nonwhite and County. To convince yourself, try running:
proc glm data=model2;
class county;
model nonwhite = county / noint;
run;
- a perfect fit! i.e. nonwhite can be expressed as a linear combination of the counties. This is because nonwhite has a single value for each county in your data.
Thanks.
Also it looks like you have overdispersion which isn't uncommon with Poisson (deviance value/df in output equals ~ 60). This value should be around 1. This could be due to a number of things... missing important predictor variables in the model, outliers in the data, positive correlation between responses if working with clustered data. Whatever the cause, it's throwing off your standard errors and type III test results ultimately increasing type I error.
Try using negative binomial. Change dist=negbin, keep the rest of your code the same, and re-run.
At the bottom of the Analysis of ML Parameter Estimates output table there will be an estimated dispersion parameter. If this value is significantly greater than 0, it confirms overdispersion in your original Poisson model.
Re-check the deviance value/df to see if the dispersion parameter from the negbin distribution helped accommodate the excess variability. If not, consider researching the PSCALE and DSCALE options in the MODEL statement to adjust the standard errors and/or revisit your data to determine potential root cause(s).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.