Hello all,
I am conducting interaction analysis in a logistic model. I believe I have written the code to do so correctly and have produced odds ratios for the marginal effects of the moderating variable (male) across racial group (Black vs. White).
Here is my code:
proc logistic data=Dataset descending;
class Black (ref='0') / param=ref;
class Male (ref='0') / param=ref;
model &DEPVAR= &COVARIATES
Black*Male
/ CLODDS=WALD ORPVALUE;
oddsratio Black / diff=ref;
run;
Here is the relevant portion of my output:
The output shows that the marginal effect of race on the outcome is significant and decreases the odds by about 11% (0.893) for females, and significant and increases the odds by about 19% (1.186) for males. I believe these marginal effects are usually referred to as "first differences."
Based on some papers I have read, I think I should also determine whether the difference between the marginal effects is significant. I believe this is usually referred to as a test of "second differences."
I think I understand this is calculated as the difference in odds ratios (1.186 - 0.893) divided by the standard error of the difference.
Is there a method I can use in SAS to output the "second differences", i.e., the significance of the difference between marginal effects?
Thanks for reading! Any help is appreciated.
StatDave: you are correct - the single DV is binary. The covariates are either continuous or binary dummies. The two interaction variables (Black and Male) are binary. My apologies that was not clear from the original post. Thanks!
Okay. It must be noted that odds ratios are not marginal effects. The marginal effect of Black would be the difference in the mean predicted values when first setting Black in all observations to one level and then setting Black in all observations to the other level. That would result in a single difference value which is the marginal effect of the binary Black variable. Note that all of the other variables in the model use their observed values when computing each predicted value going into each average. If that is what you want, then you can do that with the Margins macro. Alternatively, you could compute the marginal effect of Black for each level of Male by also fixing the Male variable to each level in turn and then differencing the resulting mean predicted values.
For example, to get the overall marginal effect of Black:
%Margins(data=Dataset, response=&depvar, model=&covariates black*male,
class=Black Male, dist=binomial,
margins=Black, options=desc diff)
If what you really want is a difference of odds ratios, then that would be done very differently. Note that an odds ratio is just a linear combination of model parameters which effectively fixes all of the variables in the model to specific values, unlike as done for margins as described above.
Thank you for your detailed reply, Dave.
I apologize if I am confusing the issue by using incorrect terminology. The reason that I referred to the difference in odds ratios as marginal effects is because - if I have understood correctly - this is what they are called in the attached paper by Buis (2010). Note, however, that he does point out that this is "a slight abuse of terminology."
With that said, I am indeed trying to calculate the difference in odds ratios. In the attached paper, Buis discusses why and how to do this for "first differences" (but he doesn't call them that). I have already got this far, as shown in the output in my original post.
What I am trying to calculate is the value of "second differences" as described on page 87 of the second attached paper by Mize (2019), under the subheading "Testing the equality of marginal effects: second differences." Specifically, in the example I posted in the OP, I am trying to find out if the odds ratio for the marginal effect of being Black and male compared to White and male (0.893) is significantly different from the marginal effect of being Black and female compared to White and female (1.186).
I should point out that I have no expectation that you (or anyone) would take your time reading the attached papers, but I want to be as thorough as possible in explaining my problem, and also help others out in the future that may have the same issue that I am currently struggling with.
Thanks again for any advice you can give me!
This just goes back to the frequent request to estimate and test the "difference in difference" (DID) of means in a model involving interaction which I discuss in this note. The slight wrinkle in your case is that you don't seem to want the DID of means, you want the difference of individually exponentiated differences of log odds which is the difference of odds ratios. As discussed in the second part of the above note using a logistic model, the LSMESTIMATE statement estimates and tests the DID of log odds which is the log of the ratio of odds ratios. So, exponentiating that estimate yields an estimate of the ratio of odds ratios and the test that the log of the ratio equals 0 from the LSMESTIMATE statement is a test that the ratio of odds ratios is 1. So, a significant test would indicate that the ratio of odds ratios differs from 1 which is the same as testing that the difference is zero. But if you only need a test of the difference of odds ratios, you don't even need to use the LSMESTIMATE statement because, as you can see in the example, the estimate and test from the LSMESTIMATE statement simply reproduce the interaction estimate and test in the model. So, all you need is the test of your interaction term.
You can prove all this to yourself by working through all the logging, differencing, and exponentiating based on the example in the note .But again, I would not refer to this as a "marginal effect."
Thank you very much for the detailed explanation. It was very helpful for you to walk through the explanation at how to arrive at the solution instead of simply stating "all you need is the test of your interaction term."
I have used the code from your example and arrived at the same solution.
One thing I'm still a bit confused about: in a two-way interaction, alternative combinations of the interacting variables are possible that change the interpretation. In the example I have been working through, Black*Male, there are four possible differences, each having a distinct odds ratio:
proc logistic data=Dataset descending;
class Black / param=glm ref=first;
class Male / param=glm ref=first;
model DepVar= &Covariates Black*Male
/ clodds=wald ORPVALUE;
oddsratio Black / diff=all;
oddsratio Male / diff=all;
estimate "Diff in Diff" Black*Male 1 -1 -1 1;
lsmeans Black*Male / e ilink;
ods output coef=coeffs;
lsmestimate Black*Male "Diff in Diff LogOdds" 1 -1 -1 1 / elsm;
run;
I think I understand that even though the odds ratio for each of the differences is distinct, the difference of differences is the same for each possible comparison, which is also equivalent to the estimate and p-value for the interaction term itself. For example, the outcome is 1.186x more likely for Black males compared to White males, and the outcome is 0.893x less likely for Black females compared to White females. The difference in these differences is 0.2839, and is statistically significant. Also, the outcome is 1.223x more likely for Black males compared to Black females, and the outcome is 0.921x less likely for White males compared to White females. As above, the difference in these differences is *also* 0.2839, and is statistically significant. Have I got this correct?
Building upon the above, I would also like to analyze a three-way interaction and compare the difference in odds ratios for specific combinations of the interaction. Does the same logic as above hold true in a three-way interaction? i.e., the interpretation of the odds ratios for each of the differences is distinct, but the ratio of each possible comparison (difference in differences) is the same, and is also equivalent to the estimate and p-value for the interaction term itself. Is that correct?
What I'd like to do for the three-way interaction is compare the group with the highest odds ratio to the group with the next-highest odds ratio. I'll have to figure out how to do that with the LSMESTIMATE statement - it's a bit confusing.
For a three-way interaction, because I would be using LSMESTIMATE to compare only a portion of the possible comparisons, am I correct that the results from LSMESTIMATE will NOT be equivalent to the estimate and significance of the interaction term itself, as they were for the two-way interaction example discussed above?
Whether for the four groups (predictor level combinations) in a two-way interaction of binary predictors, or the eight groups in a three-way interaction, each group has an odds or log odds. It does not have an odds ratio. An odds ratio or log odds ratio compares the odds (or log odds) of exactly two of the groups. If you are saying that you want to compare pairs of the groups, then this is simply done using the LSMEANS statement like the following which will provide the odds ratio and log odds ratio for each pairwise comparison of the groups defined by the a*b*c interaction.
lsmeans a*b*c / ilink diff oddsratio cl;
Since there will be several odds ratio estimates (28 pairwise comparisons for 8 groups) it is obviously not possible for these to all match the three-way interaction parameter estimate. In the case of the difference in difference, we used the LSMESTIMATE statement to estimate a single function of the four groups which is equivalent to the definition of the interaction, so the estimate and p-value of that single estimate is the same as the parameter estimate and p-value of the two-way interaction parameter.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.