BookmarkSubscribeRSS Feed
EpiMoby
Calcite | Level 5

I am trying to estimate a risk difference using GEE (proc genmod). The outcome is clustered count data modeled as a rate (deaths over births), so Poisson distribution is preferred but I can only seem to get this to work with the normal distribution. 

 

Here's what works: 

proc genmod data= a.pmss;
class geoid1;
model prmr= RS_pctNOHS/ link=identity dist=normal;
repeated subject=geoid1;
weight births;
run;
*RD: 1.6358 1.1015 to 2.1701

 

Any idea for code to get a similar output but based on the Poisson or Binomial? So far nothing will converge. 

 

5 REPLIES 5
StatDave
SAS Super FREQ

If the response data are counts or proportions, then the response is not normally distributed. Use the NLMeans macro as shown in this note. It works the same way with a procedure like GENMOD, but you should use the newer PROC GEE to fit GEE models. It has essentially the same syntax as PROC GENMOD. When fitting the GEE model, specify the variable containing the counts in the numerator of your risks as the response variable, and then specify a variable containing the log of the counts in the denominator of your risks in the OFFSET= option. Specify DIST=BIN. Then use LSMEANS, STORE, and ODS OUTPUT statements as shown followed by the NLMeans macro call.

EpiMoby
Calcite | Level 5

Thank you - I have tried that but can't get the code right. The exposure is the percent of people in the county without a high school education (this shouldn't be in the class statement but SAS is telling me the exposure needs to be). Further assistance is greatly appreciated! 

 


proc gee data= dataset;
class geoid1;
model deaths= RS_pctNOHS/ dist=bin offset=ln_n;
repeated subject=geoid1;
lsmeans deaths / diff cl;
store out = diffmodel;
ods output coef=coeffs;
run;

%NLMeans(instore=diffmodel, coef=Coeffs, title=Differences of Means)

StatDave
SAS Super FREQ

I have to assume that when you say "exposure" you mean RS_pctNOHS. If that is the case and that variable has continuous values, then you can't get a "risk difference" since a difference implies a comparison of two levels, not a continuous range. The variable in the LSMEANS statement should be a categorical variable specified in the CLASS statement. That is what the message you are seeing means. Additionally, the ODS OUTPUT statement will fail unless you include the E option as shown in the note I referred to.

 

And, sorry, but for the code you showed using a count response, you should use DIST=POISSON. But it would actually make more sense to specify "model deaths/n = RS_pctNOHS/ dist=bin;" , where n is your variable of denominator counts (NOT logged). This models the proportions as binomial.

EpiMoby
Calcite | Level 5

Yes - the exposure is RS_pctNOHS, which you're right, it's not a categorical variable. What I'm trying to estimate is the absolute difference in mortality between individuals living in counties with the median percent no high school compared to individuals living in counties 1 standard deviation above the median. The exposure has been robustly standardized to the median. I am able to do this with the initial code I posted that utilized the normal distribution but as you've mentioned Poisson or binomial would be more appropriate, so I am trying to figure it out. 

StatDave
SAS Super FREQ
Then you would need to have a binary predictor variable that has just those two values and it would appear in the CLASS and LSMEANS statements. Which would presumably limit the data used considerably. The alternative would be to use a marginal effect for your continuous predictor, which would give you the change in risk for a unit change in the predictor. You could use the Margins macro for that (shown earlier in the note), or the MARGINS statement in PROC GENMOD if you have a recent release of Viya 4.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1037 views
  • 1 like
  • 2 in conversation