BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
samp945
Obsidian | Level 7

Hello all,

 

I am conducting a logistic regression with an interaction between three dummy variables. Here is my code:

 

 

proc logistic data=Multivariate descending;
	class Black  (ref='0') / param=ref;
	class Young  (ref='0') / param=ref;  
	class Male  (ref='0') / param=ref;  

	model &OUTCOME = 	&KEYIVS
						Black*Young*Male
						/ clodds=wald ORPVALUE;
	oddsratio Black / diff=ref;
	oddsratio Young / diff=ref;
	oddsratio Male / diff=ref;
run;

I've attached the pertinent output as a PDF because I'm not sure how else to display a table.

 

My confusion is that there are three groups of odds ratios that are the same value across three separate combinations of the predictor values (1.336, 0.802, and 1.167). When I run two-way interactions there are no repeated values like this.

 

Am I misunderstanding the displayed data, or have I written the three-way interaction code incorrectly? Thanks for your advice!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @samp945,

 

The results suggest that macro variable KEYIVS contains the individual variable names Black, Young and Male (and possibly more), but not their two-way interactions. I think you should add Black*Young, Black*Male and Young*Male to the MODEL statement in order to obtain reasonably interpretable results. Without the two-way interactions you have too few degrees of freedom (see column DF in table "Analysis of Maximum Likelihood Estimates") to estimate, e.g., the odds ratios "Black 1 vs 0" at the four possible combinations of Young and Male individually.

View solution in original post

7 REPLIES 7
FreelanceReinh
Jade | Level 19

Hello @samp945,

 

The results suggest that macro variable KEYIVS contains the individual variable names Black, Young and Male (and possibly more), but not their two-way interactions. I think you should add Black*Young, Black*Male and Young*Male to the MODEL statement in order to obtain reasonably interpretable results. Without the two-way interactions you have too few degrees of freedom (see column DF in table "Analysis of Maximum Likelihood Estimates") to estimate, e.g., the odds ratios "Black 1 vs 0" at the four possible combinations of Young and Male individually.

samp945
Obsidian | Level 7

You are, of course, precisely correct! Thanks for pointing that out to me. I know better, but I couldn't see my mistake. Thank you!

K331
Calcite | Level 5
Thank you for this. I am wondering how to compare the odds ratios interaction terms. For example, in the output attached by the user who wrote in, the first odds ratio is for Black where Young=0, and Male=0, with the odds being 1.336. So we can say that Black non-young, non-male persons are 1.336 times more likely to [whatever the dependent variable is]. But my question is: who are we comparing the Black non-young, non-male to? Is this compared to White non-young, non-male? Or, compared to Black, young, male?
FreelanceReinh
Jade | Level 19

Hello @K331 and welcome to the SAS Support Communities!

 

In the correctly specified model (i.e., including the two-way interaction terms) the odds ratio estimate "Black 1 vs 0 at Young=0 Male=0" estimates the odds ratio of Black=1 vs. Black=0 in the subpopulation "Young=0 & Male=0". So it compares black, non-young, non-males to non-black, non-young, non-males.

 

In general, the wording "... times more likely than ..." describes a relative risk, not an odds ratio. Estimates of the relative risks can be computed from the parameter estimates in the "Analysis of Maximum Likelihood Estimates" table of the PROC LOGISTIC output. Example: Let i denote the intercept and b the parameter estimate for "Black" in that table. Then the relative risk of Black=1 vs. Black=0 in the subpopulation "Young=0 & Male=0" can be estimated as

logistic(i+b)/logistic(i) = exp(b)*(1+exp(i))/(1+exp(i+b))
K331
Calcite | Level 5
Thank you! This is very helpful. If I were presenting results to a
non-technical audience in which I couldn't use the relative risk language,
would I be able to use the language "have an odds of..." instead?
FreelanceReinh
Jade | Level 19

By setting the predictor variables (like Black, Young, Male) to values of your choice and using the parameter estimates from PROC LOGISTIC output you can compute the log odds (i.e., log(p/(1-p)) for an individual with those characteristics to experience the event of interest according to the binary logistic regression model. The odds (i.e., p/(1-p)) are the exponentiated log odds. So, yes, you can say that, according to the model, one subgroup (e.g., black, young, male persons) has an odds of, say, 1.2, whereas another subgroup (e.g., non-black, young, male persons) has an odds of, say, 2.3, "... which is almost twice ..." (thus mentioning the odds ratio without explicitly using that term).

 

Note, however, that the odds computed from the model don't necessarily approximate the odds in the population the model is applied to: The analysis dataset could have been artificially constructed rather than being a representative random sample from the population. For example, oversampling could have been used in order to increase the proportion of subjects having experienced the event of interest. But the odds ratios would still be valid in this situation. In the example above, the realistic odds might be, e.g., 0.024 and 0.046, respectively.

K331
Calcite | Level 5
Thank you so much, yes that makes total sense.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1297 views
  • 0 likes
  • 3 in conversation