BookmarkSubscribeRSS Feed
jackie4
Calcite | Level 5

Hi, I am working on an epidemiologic analysis using SAS 9.4 and would appreciate guidance on how to correctly estimate a Relative Excess Risk due to Interaction (RERI) when the exposure variable has three categories.

Context of the analysis:

  • I am using multiple imputation (PROC MI → PROC MIANALYZE) to address missingness in both the exposure and outcome variables.
  • My exposure variable has three levels representing online social interaction:
    1 = more interaction
    2 = less interaction
    3 = about the same
  • My effect measure modifier (poverty) and outcome (depression) are both binary.
  • I would like to compute a RERI for the interaction between the exposure and poverty without collapsing the exposure into a binary variable, if possible.

My question:
Is there a valid way in SAS to compute a RERI when the exposure has three categories after running PROC MI and PROC MIANALYZE?

Specifically, I am trying to understand whether:

  1. SAS supports RERI estimation for multinomial exposures using GENMOD or LOGISTIC after MI,
  2. Or whether collapsing the exposure into a binary variable is the only feasible approach for RERI in SAS.

If there is an example of code or recommended workflow for handling RERI with a three‑level exposure after multiple imputation, I would be grateful for any guidance.

Thank you very much for your help.

Best regards,

4 REPLIES 4
jackie4
Calcite | Level 5

Here is my code so far: 

 

TABLE 2;
 
/* crude risk ratio */ *3= about the same interaction, 1=more and 2= less*;
 
proc genmod data=table;
class RLB056C(ref='3') / param=ref;
model depression(event='1')=RLB056C / dist=poisson link=log;
estimate 'RR: 1 vs 3' RLB056C 1 0 / exp;
estimate 'RR: 2 vs 3' RLB056C 0 1 / exp;
run;
 
* *2. Multiple Imputation - explicit proc mic;
 
proc mi data=table nimpute=20 seed=121372 out=testmi;
 
    /* Class statement: categorical variables including exposure/outcome */
    class depression RLB056C RAHISPAN H15INPOV RAGENDER 
          RARACEM R15PSYCHE;
 
    /* Variables to include in imputation */
    var depression RLB056C RAHISPAN H15INPOV RAGENDER 
        RARACEM R15AGEY_B R15PSYCHE;
 
    /* Outcome (binary) */
    fcs logistic (depression = RLB056C RAHISPAN H15INPOV RAGENDER 
                               RARACEM R15AGEY_B R15PSYCHE);
 
    /* Exposure (multinomial) */
    fcs logistic (RLB056C = depression RAHISPAN H15INPOV RAGENDER 
                             RARACEM R15AGEY_B R15PSYCHE / link=glogit);
 
    /* Binary covariates */
    fcs logistic (RAHISPAN = depression RLB056C H15INPOV RAGENDER 
                              RARACEM R15AGEY_B R15PSYCHE);
 
    fcs logistic (H15INPOV = depression RLB056C RAHISPAN RAGENDER 
                              RARACEM R15AGEY_B R15PSYCHE);
 
    fcs logistic (RAGENDER = depression RLB056C RAHISPAN H15INPOV 
                              RARACEM R15AGEY_B R15PSYCHE);
 
    fcs logistic (R15PSYCHE = depression RLB056C RAHISPAN H15INPOV 
                                RAGENDER RARACEM R15AGEY_B);
 
    /* Multinomial covariates */
    fcs logistic (RARACEM = depression RLB056C RAHISPAN H15INPOV 
                             RAGENDER R15AGEY_B R15PSYCHE / link=glogit);
 
    /* Continuous confounder */
    fcs reg (R15AGEY_B = depression RLB056C RAHISPAN H15INPOV 
                              RAGENDER RARACEM R15PSYCHE);
 
run;
-------------------------------------------------------------------------------------
 
/* *this is proc mi with all the covariate- the one above is the simplify version */
/* because sas studio does not support all these covariates- need to run it on sas destop; */
/*  */
 
 
/* proc mi data=HRS2022_Covar nimpute=5 seed=121372 out=testmi; */
/*     class depression RLB056C RAHISPAN H15INPOV RAGENDER  */
/*           RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES; */
/*     var depression RLB056C RAHISPAN H15INPOV RAGENDER  */
/*         RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES R15AGEY_B; */
/*  */
/*     Outcome */
/*     fcs logistic (depression = RLB056C RAHISPAN H15INPOV RAGENDER  */
/*                                RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES R15AGEY_B); */
/*  */
/*     Exposure */
/*     fcs logistic (RLB056C = depression RAHISPAN H15INPOV RAGENDER  */
/*                              RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES R15AGEY_B / link=glogit); */
/*  */
/*     Binary covariates */
/*     fcs logistic (RAHISPAN = depression RLB056C H15INPOV RAGENDER  */
/*                               RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES R15AGEY_B); */
/*     fcs logistic (H15INPOV = depression RLB056C RAHISPAN RAGENDER  */
/*                               RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES R15AGEY_B); */
/*     fcs logistic (RAGENDER = depression RLB056C RAHISPAN H15INPOV  */
/*                               RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES R15AGEY_B); */
/*     fcs logistic (R15PSYCHE = depression RLB056C RAHISPAN H15INPOV  */
/*                                 RAGENDER RARACEM RAEDUC R15MSTAT H15HHRES R15AGEY_B); */
/*  */
/*     Multinomial covariates */
/*     fcs logistic (RARACEM = depression RLB056C RAHISPAN H15INPOV  */
/*                              RAGENDER R15PSYCHE RAEDUC R15MSTAT H15HHRES R15AGEY_B / link=glogit); */
/*     fcs logistic (RAEDUC = depression RLB056C RAHISPAN H15INPOV  */
/*                              RAGENDER RARACEM R15PSYCHE R15MSTAT H15HHRES R15AGEY_B / link=glogit); */
/*     fcs logistic (R15MSTAT = depression RLB056C RAHISPAN H15INPOV  */
/*                              RAGENDER RARACEM R15PSYCHE RAEDUC H15HHRES R15AGEY_B / link=glogit); */
/*     fcs logistic (H15HHRES = depression RLB056C RAHISPAN H15INPOV  */
/*                              RAGENDER RARACEM R15PSYCHE RAEDUC R15MSTAT R15AGEY_B / link=glogit); */
/*  */
/*     Continuous */
/*     fcs reg (R15AGEY_B = depression RLB056C RAHISPAN H15INPOV  */
/*                               RAGENDER RARACEM R15PSYCHE RAEDUC R15MSTAT H15HHRES); */
/* run;
-------------------------------------------------------------------------------------------------------------
/* *crude with MI*; */;
 
proc genmod data=testmi;
class RLB056C(ref='3') / param=ref;
model depression(event='1')=RLB056C / dist=poisson link=log;
by _Imputation_;
estimate 'RR: 1 vs 3' RLB056C 1 0 / exp;
estimate 'RR: 2 vs 3' RLB056C 0 1 / exp;
ods output Estimates=genmod_out;
run;
 
/* 3. Adjusted Risk Ratios - poison   */;
proc genmod data=testmi;
    class RLB056C(ref='3') RAHISPAN H15INPOV RAGENDER RARACEM R15PSYCHE / param=ref;
    model depression(event='1') = RLB056C RAHISPAN H15INPOV RAGENDER
                                   RARACEM R15PSYCHE R15AGEY_B
                                   / dist=poisson link=log;
    by _Imputation_;
    estimate 'RR: 1 vs 3' RLB056C 1 0;
    estimate 'RR: 2 vs 3' RLB056C 0 1;
    ods output Estimates=genmod_out;
run;
 
/* Keep only log estimates */
data genmod_out_log;
    set genmod_out;
    if index(Label,'Exp(')=0;
run;
 
/* Sort by Label and Imputation before MIANALYZE */
proc sort data=genmod_out_log;
    by Label _Imputation_;
run;
 
/* Pool estimates with MIANALYZE (no class statement needed) */
proc mianalyze data=genmod_out_log;
    modeleffects LBetaEstimate;
    stderr StdErr;
    by Label;                 /* Pool separately for each contrast */
    ods output ParameterEstimates=pooled_RR;
run;
 
 
/* Exponentiate to get final RRs */
data pooled_RR_final;
    set pooled_RR;
    RR  = exp(Estimate);
    LCL = exp(Estimate - 1.96*StdErr);
    UCL = exp(Estimate + 1.96*StdErr);
run;
 
 
/* RERI */
 
data testmi_binary;
    set testmi;
 
    /* More/same interaction vs Less */
    if RLB056C = 1 then interaction_bin = 0;   /* Less = reference */
    else interaction_bin = 1;   
run;
 
 
 
proc genmod data=testmi_binary;
    class interaction_bin(ref='0') 
          H15INPOV(ref='0') / param=ref;
 
    model depression(event='1') =
          interaction_bin
          H15INPOV
          interaction_bin*H15INPOV
          / dist=poisson link=log;
 
    by _Imputation_;
 
    ods output ParameterEstimates=genmod_out;
run;
 
proc sort data=genmod_out;
    by Parameter;
run;
 
proc mianalyze data=genmod_out;
    modeleffects Estimate;
    stderr StdErr;
    by Parameter;
 
    ods output ParameterEstimates=pooled_estimates;
run;
 
data pooled_rr;
    set pooled_estimates;
 
    RR  = exp(Estimate);
    LCL = exp(Estimate - 1.96*StdErr);
    UCL = exp(Estimate + 1.96*StdErr);
run;
 
proc contents data= pooled_estimates ;
run;
 
data reri_calc;
    retain betaE betaM betaEM;
 
    set pooled_estimates;
 
    if Parameter = "interaction_bin" then betaE = Estimate;
    if Parameter = "H15INPOV" then betaM = Estimate;
    if Parameter = "interaction_bin*H15INPOV" then betaEM = Estimate;
 
    if _N_ = 3 then do;
        RR10 = exp(betaE);
        RR01 = exp(betaM);
        RR11 = exp(betaE + betaM + betaEM);
 
        RERI = RR11 - RR10 - RR01 + 1;
 
        output;
    end;
run;
 
StatDave
SAS Super FREQ

RERI appears to be the same idea as the "difference in difference of means" but looking at the difference in difference of relative risks instead of probabilities (means). See this note on estimating the difference in difference of means. It is essentially the interaction between the predictors but using the mean, or relative risk, in each term of the RERI formula. The "Generalized Linear Models with a Non-Identity Link" section of the note uses a logistic model on binary predictors and response. It then computes the difference in difference of probabilities which is often thought to be more easily understood than log odds, odds ratios, or relative risks. Nevertheless, the NLMeans macro could be used compute the RERI for this model. You might need to download the latest version of the macro to obtain the functionality shown below.

 

For example, these statements fit the model, save the necessary components for NLMeans, and then NLMeans computes RERI by specifying its formula in f= as shown in flabel=.

 

      proc logistic;
        class a b / param=glm ref=first;
        model y(event="1") = a b a*b;
        lsmeans a*b / e ilink;
        ods output coef=coeffs;
        store log;
        run;
      %nlmeans(instore=log, coef=coeffs, link=logit, f=mu1/mu4-mu2/mu4-mu3/mu4+1, 
               title=RERI, flabel=RR(A,B)-RR(A-,B)-RR(A,B-)+1)

Note, however, that some would use a different model which defines the additivity of the predictors differently. For example, see Richardson DB, Kaufman JS. Estimation of the relative excess risk due to interaction and associated confidence bounds. Am J Epidemiol. 2009 Mar 15;169(6):756-60.

In the case of a multinomial predictor, as in your case, you could either compute a separate RERI for each non-reference level, or combine the non-reference levels to make the predictor binary.

 

Using the example in the note, suppose that A has three levels (2, 1, 0). The following computes the RERI for A=1 and then for A=2, considering A=0 as the reference level.

 

      %nlmeans(instore=log, coef=coeffs, link=logit, f=mu1/mu6-mu2/mu6-mu5/mu6+1, 
               title=RERI(A1), flabel=RR(A1,B)-RR(A1-,B)-RR(A1,B-)+1)
      %nlmeans(instore=log, coef=coeffs, link=logit, f=mu3/mu6-mu4/mu6-mu5/mu6+1, 
               title=RERI(A2), flabel=RR(A2,B)-RR(A2-,B)-RR(A2,B-)+1)
StatDave
SAS Super FREQ

On further review and considering VanderWeele & Knol (2014), my approach with NLMeans might not be correct. As they show for a logistic model on two binary predictors (logit(p)=b0+b1*A+b2*B+b3*AB), RERI is defined as a function of odds ratios and computed as

 

    RERI = OR(A1B1) - OR(A1B0) - OR(A0B1) + 1 = exp(b1+b2+b3) - exp(b1) - exp(b2) + 1

 

For example, using the logistic model and comparing the odds of A1B1 to A0B0:

 

logit(p_A1B1) = logodds(A1B1) = b0+b1+b2+b3
and
logit(p_A0B0) = logodds(A0B0) = b0

 

The difference is

 

logodds(A1B1)-logodds(A0B0) = log(OR(A1B1) = b1+b2+b3. So, OR(A1B1) = exp(b1+b2+b3). Similarly for OR(A1B0) and OR(A0B1).

 

Since the RERI is just a function of the logistic model parameters, it can be easily computed using the NLEST macro. For the binary predictors example in the note on difference in difference that I referred to earlier, these statements fit the model and compute RERI.

 

proc logistic;
  class a b / param=glm ref=first;
  model y(event="1") = a b a*b;
  lsmeans a*b / e ilink;
  ods output coef=coeffs;
  store log;
  run;
%nlest(instore=log, label=RERI_or, f=exp(b_p2+b_p4+b_p6)-exp(b_p2)-exp(b_p4)+1)

Or you could use NLMeans, but by defining RERI as a function of odds ratios rather than relative risks as I did earlier:

 

%nlmeans(instore=log, coef=coeffs, link=logit, flabel=RERI_or,
  f=(mu1/(1-mu1))/(mu4/(1-mu4)) - (mu2/(1-mu2))/(mu4/(1-mu4)) - (mu3/(1-mu3))/(mu4/(1-mu4)) + 1)

This method reproduces the RERI computed for the second example in the Richardson & Kaufman paper that uses a different model form.

 

Again, for the multi-level predictor situation like yours, you can do the RERI computation for each of the non-reference levels. For the example of A with 3 levels (0, 1, 2) and B with 2 levels (0, 1):

 

 

proc logistic;
  class a b / param=glm ref=first;
  model y(event="1") = a b a*b;
  store log;
  run;
%nlest(instore=log, label=RERI_or_A2, f=exp(b_p3+b_p5+b_p9)-exp(b_p3)-exp(b_p5)+1)
%nlest(instore=log, label=RERI_or_A1, f=exp(b_p2+b_p5+b_p7)-exp(b_p2)-exp(b_p5)+1)

---

VanderWeele, T. J., & Knol, M. J. (2014). A tutorial on interaction. Epidemiologic methods, 3(1), 33-72.

 

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 566 views
  • 2 likes
  • 3 in conversation