Hi all,
I would like to first thank you all for providing your expert advise and helping us understand SAS better. I am new with these statistical modeling especially using proc genmod. My variables are Region (0-9), race (0-3), rates, cases, population and year (1996-2012).
What I want to compare is rates among different racial groups across the regions (so compare 1-2, 1-3, 2-3, etc). I was told in a SAS class that proc genmod with poisson distribution would be ideal. We are dealing with not normal surveillance data and rates aren't best with other distributions. We want to control for overdispersion and get Relative Risk and Confidence intervals. We would like to see the difference over time if possible but I know it may be difficult. Does the model automatically takes the log or I should first get log and offset that. (data race; set race; Y=cases/population*100000; ln=log(Y); run; and
My model goes like this
proc genmod data=race;
class region race;
model Y=region race / dist=p link=log scale=pearson; (or 2nd model Y=region race / dist=p link=log offset=ln scale=pearson;)
repeated subject=region / type=unstr;
lsmeans region sex / exp cl tdiff e om;
run;
quit;
Thanks CP
My thoughts;
Change Y to cases. Integer counts of the number of cases. The offset then would be log(population/100000). I see 'sex' in the lsmeans statement, but it is not in the model statement, so that will lead to non-execution.
Also, with region having 10 levels, you are estimating 45 parameters with an unstructured covariance matrix. Using a rule of thumb of at least 10 observations per parameter, you will need a moderate sized dataset. You may wish to consider a more restrictive covariance matrix--in particular type=ind, which assumes separate variances for each region, and that the regions are independent (well, uncorrelated).
As far as overdispersion, the first thing I would think of would be shifting to a negative binomial distribution. If there is still an overdispersion problem, you will likely have to switch procedures to GLIMMIX.
Steve Denham
Hi,
Thank you for your help. I greatly appreciate it.
I have modified the model and which seems to be working.
Proc genmod data=ctrace96;
class region race;
model cases=region sex/dist=p offset=lnn scale=pearson type3;
repeated subject=region/type=ind;
lsmeans region sex/exp cl diff e om; run; quit;
This does comparison and produce confidence interval and odds ratio. What I am interested in is Risk ratio and differences of those ratios. Is there any option that I can include or any way to find RR?
How do we test if this model controls effectively for overdispersion? I have also tried Glimmix but have not been able to produce a comparison table like proc genmod does. As you suggested, overdispersion can be controlled by negative binomial dist. Is it ok to just use DIST=NEGBIN in above model and keep rest same. Output for the above model doesn't give goodness of fit table so can't figure out if overdispersion is an issue or not.
Thanks
CP
As far as changing the distribution, you are on the right track.
Deviance or Pearson chi-square divided by degrees of freedom gives a measure of overdispersion. In GLIMMIX you could check the Fit Statistics table for this parameter. In GENMOD, you may have to do this by hand, and then refit using scale=<the overdispersion value you find>.
As far as relative risk estimates, again it is outside my usual field. Check for posts by that may address this.
Steve Denham
 
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.
