BookmarkSubscribeRSS Feed
LAM13
Calcite | Level 5

I have count data for each NJ county and for each year (2007-2013). I am trying to run a genmod procedure with offset for poisson regression. Using the offset as the log of the population. The algorithm converges but I end up with 2 reference groups for county (Warren & Sussex). The attached contains the data and the model output. I considered the possibility that a repeated statement might be necessary but not sure the data supports it's use. Any assistance would be appreciated.

 

 

Using SAS 9.3

 

proc genmod data=model2;

class county year;

model countsd=nonwhite county year/dist=poisson link=log offset=logpop type3;

run;

3 REPLIES 3
PGStats
Opal | Level 21

In the Parameter Estimates table, County = "Union" may look like a reference level but it isn't. It is removed from the independent factors becaure there is a linear dependency between Nonwhite and County. To convince yourself, try running:

 

proc glm data=model2;
class county;
model nonwhite = county / noint;
run;

- a perfect fit! i.e. nonwhite can be expressed as a linear combination of the counties. This is because nonwhite has a single value for each county in your data.

 

PG
JustABitOutside
Fluorite | Level 6

Also it looks like you have overdispersion which isn't uncommon with Poisson (deviance value/df in output equals ~ 60). This value should be around 1. This could be due to a number of things... missing important predictor variables in the model, outliers in the data, positive correlation between responses if working with clustered data. Whatever the cause, it's throwing off your standard errors and type III test results ultimately increasing type I error.

 

Try using negative binomial. Change dist=negbin, keep the rest of your code the same, and re-run. 

 

At the bottom of the Analysis of ML Parameter Estimates output table there will be an estimated dispersion parameter. If this value is significantly greater than 0, it confirms overdispersion in your original Poisson model. 

 

Re-check the deviance value/df to see if the dispersion parameter from the negbin distribution helped accommodate the excess variability. If not, consider researching the PSCALE and DSCALE options in the MODEL statement to adjust the standard errors and/or revisit your data to determine potential root cause(s). 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2153 views
  • 1 like
  • 3 in conversation