- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I am conducting a logistic regression with an interaction between three dummy variables. Here is my code:
proc logistic data=Multivariate descending;
class Black (ref='0') / param=ref;
class Young (ref='0') / param=ref;
class Male (ref='0') / param=ref;
model &OUTCOME = &KEYIVS
Black*Young*Male
/ clodds=wald ORPVALUE;
oddsratio Black / diff=ref;
oddsratio Young / diff=ref;
oddsratio Male / diff=ref;
run;
I've attached the pertinent output as a PDF because I'm not sure how else to display a table.
My confusion is that there are three groups of odds ratios that are the same value across three separate combinations of the predictor values (1.336, 0.802, and 1.167). When I run two-way interactions there are no repeated values like this.
Am I misunderstanding the displayed data, or have I written the three-way interaction code incorrectly? Thanks for your advice!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @samp945,
The results suggest that macro variable KEYIVS contains the individual variable names Black, Young and Male (and possibly more), but not their two-way interactions. I think you should add Black*Young, Black*Male and Young*Male to the MODEL statement in order to obtain reasonably interpretable results. Without the two-way interactions you have too few degrees of freedom (see column DF in table "Analysis of Maximum Likelihood Estimates") to estimate, e.g., the odds ratios "Black 1 vs 0" at the four possible combinations of Young and Male individually.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @samp945,
The results suggest that macro variable KEYIVS contains the individual variable names Black, Young and Male (and possibly more), but not their two-way interactions. I think you should add Black*Young, Black*Male and Young*Male to the MODEL statement in order to obtain reasonably interpretable results. Without the two-way interactions you have too few degrees of freedom (see column DF in table "Analysis of Maximum Likelihood Estimates") to estimate, e.g., the odds ratios "Black 1 vs 0" at the four possible combinations of Young and Male individually.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are, of course, precisely correct! Thanks for pointing that out to me. I know better, but I couldn't see my mistake. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @K331 and welcome to the SAS Support Communities!
In the correctly specified model (i.e., including the two-way interaction terms) the odds ratio estimate "Black 1 vs 0 at Young=0 Male=0" estimates the odds ratio of Black=1 vs. Black=0 in the subpopulation "Young=0 & Male=0". So it compares black, non-young, non-males to non-black, non-young, non-males.
In general, the wording "... times more likely than ..." describes a relative risk, not an odds ratio. Estimates of the relative risks can be computed from the parameter estimates in the "Analysis of Maximum Likelihood Estimates" table of the PROC LOGISTIC output. Example: Let i denote the intercept and b the parameter estimate for "Black" in that table. Then the relative risk of Black=1 vs. Black=0 in the subpopulation "Young=0 & Male=0" can be estimated as
logistic(i+b)/logistic(i) = exp(b)*(1+exp(i))/(1+exp(i+b))
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
non-technical audience in which I couldn't use the relative risk language,
would I be able to use the language "have an odds of..." instead?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By setting the predictor variables (like Black, Young, Male) to values of your choice and using the parameter estimates from PROC LOGISTIC output you can compute the log odds (i.e., log(p/(1-p)) for an individual with those characteristics to experience the event of interest according to the binary logistic regression model. The odds (i.e., p/(1-p)) are the exponentiated log odds. So, yes, you can say that, according to the model, one subgroup (e.g., black, young, male persons) has an odds of, say, 1.2, whereas another subgroup (e.g., non-black, young, male persons) has an odds of, say, 2.3, "... which is almost twice ..." (thus mentioning the odds ratio without explicitly using that term).
Note, however, that the odds computed from the model don't necessarily approximate the odds in the population the model is applied to: The analysis dataset could have been artificially constructed rather than being a representative random sample from the population. For example, oversampling could have been used in order to increase the proportion of subjects having experienced the event of interest. But the odds ratios would still be valid in this situation. In the example above, the realistic odds might be, e.g., 0.024 and 0.046, respectively.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content