From SAS I get the following output: ( also attached) The dependent variable is binary, Y = 1 for pass, Y = 0 for fail.
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
SEX M vs F 1.162 1.004 1.345
RACE 2 vs 6 1.521 1.143 2.024
My interpretation is that with an odds ratio > 1 ( the point estimate) that implies that males are more likely to pass than females, actually between 1.004 and 1.345 times as 'likely' with 95% confidence. ( I'm not sure what I mean by 'likely' I'm just repeating what I've observed in literature)
The same goes for race, RACE 2 is more likely to pass than RACE 6 based on the odds ratios.
I know these results are significant because 1) the Wald Chi-square in the Type 3 analysis of effects ( not given above) is significant for RACE and SEX, implying differences exist and 2) the 95% wald confidence intervals above do not contian '1'. ( If this is correct I don't understand why )
If I'm correct about all of the above, I still have a problem. Just by observing the data, and theory, Females pass at greater percentages than males and RACE 6 passes with greater percentages than RACE 2. My results contradict theory and observation of raw data on both accounts.
I believe in linear regression, when you get the wrong sign on a coefficient it could be caused by an omitted variable. ( EX: if the sign is unexpectedly negative for Beta on X1 then you could be omitting a variable say X2 that is positively correlated to X1 but negatively correlated with Y in a model Y = B1X1 + B2X2) You can fix the problem by discovering the omitted variable and inserting it into the model.
Could an omitted variable be responsible for my odds ratio results? I'm not sure if it works like this. Can someone suggest what I'm doing /interpreting incorrectly?
I read the link- and should look into it more in depth, but
This could be it! The overall model is giving me 88% correct predictions on validation data! If I take the mean prediced probability for race 6 it is higher than the mean predicted probability for race 2 ( which is what I would expect, but opposite the results the odds ratio/contrasts tell me) It is just when I look at the contrasts and odds ratios at the individual races or sexes that I get weird results. The same thing occurs when I look at the results by sex.
So in this case, in the aggregate my model seems like it predicts well, I just can't rely on the interpretation of the individual contrast statements due to 'Simpson's Paradox'?