BookmarkSubscribeRSS Feed
deleted_user
Not applicable
From SAS I get the following output: ( also attached) The dependent variable is binary, Y = 1 for pass, Y = 0 for fail.


Odds Ratio Estimates

Point 95% Wald
Effect Estimate Confidence Limits

SEX M vs F 1.162 1.004 1.345

RACE 2 vs 6 1.521 1.143 2.024


My interpretation is that with an odds ratio > 1 ( the point estimate) that implies that males are more likely to pass than females, actually between 1.004 and 1.345 times as 'likely' with 95% confidence. ( I'm not sure what I mean by 'likely' I'm just repeating what I've observed in literature)


The same goes for race, RACE 2 is more likely to pass than RACE 6 based on the odds ratios.

I know these results are significant because 1) the Wald Chi-square in the Type 3 analysis of effects ( not given above) is significant for RACE and SEX, implying differences exist and 2) the 95% wald confidence intervals above do not contian '1'. ( If this is correct I don't understand why )

If I'm correct about all of the above, I still have a problem. Just by observing the data, and theory, Females pass at greater percentages than males and RACE 6 passes with greater percentages than RACE 2. My results contradict theory and observation of raw data on both accounts.

I believe in linear regression, when you get the wrong sign on a coefficient it could be caused by an omitted variable. ( EX: if the sign is unexpectedly negative for Beta on X1 then you could be omitting a variable say X2 that is positively correlated to X1 but negatively correlated with Y in a model Y = B1X1 + B2X2) You can fix the problem by discovering the omitted variable and inserting it into the model.

Could an omitted variable be responsible for my odds ratio results? I'm not sure if it works like this. Can someone suggest what I'm doing /interpreting incorrectly?
3 REPLIES 3
deleted_user
Not applicable
Just a note: I am using proc logistic with the descending option.
Doc_Duke
Rhodochrosite | Level 12
There is a good chance that the observation that you have made is called Simpson's Paradox.

http://en.wikipedia.org/wiki/Simpson's_paradox
deleted_user
Not applicable
I read the link- and should look into it more in depth, but

This could be it! The overall model is giving me 88% correct predictions on validation data! If I take the mean prediced probability for race 6 it is higher than the mean predicted probability for race 2 ( which is what I would expect, but opposite the results the odds ratio/contrasts tell me) It is just when I look at the contrasts and odds ratios at the individual races or sexes that I get weird results. The same thing occurs when I look at the results by sex.

So in this case, in the aggregate my model seems like it predicts well, I just can't rely on the interpretation of the individual contrast statements due to 'Simpson's Paradox'?

Is there a way to correct for this?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 913 views
  • 0 likes
  • 2 in conversation