07-17-2014 01:25 PM
I would like to first thank you all for providing your expert advise and helping us understand SAS better. I am new with these statistical modeling especially using proc genmod. My variables are Region (0-9), race (0-3), rates, cases, population and year (1996-2012).
What I want to compare is rates among different racial groups across the regions (so compare 1-2, 1-3, 2-3, etc). I was told in a SAS class that proc genmod with poisson distribution would be ideal. We are dealing with not normal surveillance data and rates aren't best with other distributions. We want to control for overdispersion and get Relative Risk and Confidence intervals. We would like to see the difference over time if possible but I know it may be difficult. Does the model automatically takes the log or I should first get log and offset that. (data race; set race; Y=cases/population*100000; ln=log(Y); run; and
My model goes like this
proc genmod data=race;
class region race;
model Y=region race / dist=p link=log scale=pearson; (or 2nd model Y=region race / dist=p link=log offset=ln scale=pearson
repeated subject=region / type=unstr;
lsmeans region sex / exp cl tdiff e om;
07-18-2014 10:36 AM
Change Y to cases. Integer counts of the number of cases. The offset then would be log(population/100000). I see 'sex' in the lsmeans statement, but it is not in the model statement, so that will lead to non-execution.
Also, with region having 10 levels, you are estimating 45 parameters with an unstructured covariance matrix. Using a rule of thumb of at least 10 observations per parameter, you will need a moderate sized dataset. You may wish to consider a more restrictive covariance matrix--in particular type=ind, which assumes separate variances for each region, and that the regions are independent (well, uncorrelated).
As far as overdispersion, the first thing I would think of would be shifting to a negative binomial distribution. If there is still an overdispersion problem, you will likely have to switch procedures to GLIMMIX.
07-21-2014 02:46 PM
Thank you for your help. I greatly appreciate it.
I have modified the model and which seems to be working.
Proc genmod data=ctrace96;
class region race;
model cases=region sex/dist=p offset=lnn scale=pearson type3;
lsmeans region sex/exp cl diff e om; run; quit;
This does comparison and produce confidence interval and odds ratio. What I am interested in is Risk ratio and differences of those ratios. Is there any option that I can include or any way to find RR?
How do we test if this model controls effectively for overdispersion? I have also tried Glimmix but have not been able to produce a comparison table like proc genmod does. As you suggested, overdispersion can be controlled by negative binomial dist. Is it ok to just use DIST=NEGBIN in above model and keep rest same. Output for the above model doesn't give goodness of fit table so can't figure out if overdispersion is an issue or not.
07-22-2014 09:56 AM
As far as changing the distribution, you are on the right track.
Deviance or Pearson chi-square divided by degrees of freedom gives a measure of overdispersion. In GLIMMIX you could check the Fit Statistics table for this parameter. In GENMOD, you may have to do this by hand, and then refit using scale=<the overdispersion value you find>.