topic Re: Estimate risk differences with clustered data in Statistical Procedures

Estimate risk differences with clustered data

EpiMoby — Wed, 26 Apr 2023 20:47:36 GMT

I am trying to estimate a risk difference using GEE (proc genmod). The outcome is clustered count data modeled as a rate (deaths over births), so Poisson distribution is preferred but I can only seem to get this to work with the normal distribution.

Here's what works:

proc genmod data= a.pmss;
class geoid1;
model prmr= RS_pctNOHS/ link=identity dist=normal;
repeated subject=geoid1;
weight births;
run;
*RD: 1.6358 1.1015 to 2.1701

Any idea for code to get a similar output but based on the Poisson or Binomial? So far nothing will converge.

Re: Estimate risk differences with clustered data

StatDave — Wed, 26 Apr 2023 20:59:24 GMT

If the response data are counts or proportions, then the response is not normally distributed. Use the NLMeans macro as shown in this note. It works the same way with a procedure like GENMOD, but you should use the newer PROC GEE to fit GEE models. It has essentially the same syntax as PROC GENMOD. When fitting the GEE model, specify the variable containing the counts in the numerator of your risks as the response variable, and then specify a variable containing the log of the counts in the denominator of your risks in the OFFSET= option. Specify DIST=BIN. Then use LSMEANS, STORE, and ODS OUTPUT statements as shown followed by the NLMeans macro call.

Re: Estimate risk differences with clustered data

EpiMoby — Wed, 26 Apr 2023 21:16:22 GMT

Thank you - I have tried that but can't get the code right. The exposure is the percent of people in the county without a high school education (this shouldn't be in the class statement but SAS is telling me the exposure needs to be). Further assistance is greatly appreciated!

proc gee data= dataset;
class geoid1;
model deaths= RS_pctNOHS/ dist=bin offset=ln_n;
repeated subject=geoid1;
lsmeans deaths / diff cl;
store out = diffmodel;
ods output coef=coeffs;
run;

%NLMeans(instore=diffmodel, coef=Coeffs, title=Differences of Means)

Re: Estimate risk differences with clustered data

StatDave — Wed, 26 Apr 2023 21:31:46 GMT

I have to assume that when you say "exposure" you mean RS_pctNOHS. If that is the case and that variable has continuous values, then you can't get a "risk difference" since a difference implies a comparison of two levels, not a continuous range. The variable in the LSMEANS statement should be a categorical variable specified in the CLASS statement. That is what the message you are seeing means. Additionally, the ODS OUTPUT statement will fail unless you include the E option as shown in the note I referred to.

And, sorry, but for the code you showed using a count response, you should use DIST=POISSON. But it would actually make more sense to specify "model deaths/n = RS_pctNOHS/ dist=bin;" , where n is your variable of denominator counts (NOT logged). This models the proportions as binomial.

Re: Estimate risk differences with clustered data

EpiMoby — Wed, 26 Apr 2023 21:33:13 GMT

Yes - the exposure is RS_pctNOHS, which you're right, it's not a categorical variable. What I'm trying to estimate is the absolute difference in mortality between individuals living in counties with the median percent no high school compared to individuals living in counties 1 standard deviation above the median. The exposure has been robustly standardized to the median. I am able to do this with the initial code I posted that utilized the normal distribution but as you've mentioned Poisson or binomial would be more appropriate, so I am trying to figure it out.

Re: Estimate risk differences with clustered data

StatDave — Wed, 26 Apr 2023 21:43:12 GMT

Then you would need to have a binary predictor variable that has just those two values and it would appear in the CLASS and LSMEANS statements. Which would presumably limit the data used considerably. The alternative would be to use a marginal effect for your continuous predictor, which would give you the change in risk for a unit change in the predictor. You could use the Margins macro for that (shown earlier in the note), or the MARGINS statement in PROC GENMOD if you have a recent release of Viya 4.