I am trying to estimate a risk difference using GEE (proc genmod). The outcome is clustered count data modeled as a rate (deaths over births), so Poisson distribution is preferred but I can only seem to get this to work with the normal distribution.
Here's what works:
proc genmod data= a.pmss;
class geoid1;
model prmr= RS_pctNOHS/ link=identity dist=normal;
repeated subject=geoid1;
weight births;
run;
*RD: 1.6358 1.1015 to 2.1701
Any idea for code to get a similar output but based on the Poisson or Binomial? So far nothing will converge.
If the response data are counts or proportions, then the response is not normally distributed. Use the NLMeans macro as shown in this note. It works the same way with a procedure like GENMOD, but you should use the newer PROC GEE to fit GEE models. It has essentially the same syntax as PROC GENMOD. When fitting the GEE model, specify the variable containing the counts in the numerator of your risks as the response variable, and then specify a variable containing the log of the counts in the denominator of your risks in the OFFSET= option. Specify DIST=BIN. Then use LSMEANS, STORE, and ODS OUTPUT statements as shown followed by the NLMeans macro call.
Thank you - I have tried that but can't get the code right. The exposure is the percent of people in the county without a high school education (this shouldn't be in the class statement but SAS is telling me the exposure needs to be). Further assistance is greatly appreciated!
proc gee data= dataset;
class geoid1;
model deaths= RS_pctNOHS/ dist=bin offset=ln_n;
repeated subject=geoid1;
lsmeans deaths / diff cl;
store out = diffmodel;
ods output coef=coeffs;
run;
%NLMeans(instore=diffmodel, coef=Coeffs, title=Differences of Means)
I have to assume that when you say "exposure" you mean RS_pctNOHS. If that is the case and that variable has continuous values, then you can't get a "risk difference" since a difference implies a comparison of two levels, not a continuous range. The variable in the LSMEANS statement should be a categorical variable specified in the CLASS statement. That is what the message you are seeing means. Additionally, the ODS OUTPUT statement will fail unless you include the E option as shown in the note I referred to.
And, sorry, but for the code you showed using a count response, you should use DIST=POISSON. But it would actually make more sense to specify "model deaths/n = RS_pctNOHS/ dist=bin;" , where n is your variable of denominator counts (NOT logged). This models the proportions as binomial.
Yes - the exposure is RS_pctNOHS, which you're right, it's not a categorical variable. What I'm trying to estimate is the absolute difference in mortality between individuals living in counties with the median percent no high school compared to individuals living in counties 1 standard deviation above the median. The exposure has been robustly standardized to the median. I am able to do this with the initial code I posted that utilized the normal distribution but as you've mentioned Poisson or binomial would be more appropriate, so I am trying to figure it out.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.