Trying to figure out why I'm getting an absurd OR for a continuous variable log_X that does not have any missing values. Any ideas?
proc logistic data=WORK.DISEASE_MM;
class Gender1 (ref="0") / param=glm;
model Strata1 (event='1')= gender1 log_X log_Y age/ link=logit clparm=both clodds=both alpha=0.05
technique=fisher scale=none aggregate lackfit;
run;
Odds Ratio Estimates and Profile-Likelihood Confidence Interverals
Effect Estimate 95% CI
gender1 1 vs 0 7.321 3.514 16.152
log_Y 0.617 0.075 5.004
log_X >999.999 >999.999 >999.999
Age 0.926 0.894 0.955
Welcome to the Community!
This looks like it could be a problem with the data, such as an outlier. Can you provide more information?
Thanks for the welcome message! 😄 Long time visitor, first time poster.
That's what I was thinking too, about an outlier, but the data range for log_X is 3.390 to 4.421 (compared to log_Y, 1.853 to 3.346).
Happy to provide any other information needed!
The estimate of the logistic regression coefficient is for a one unit change in log_X score, given the other variables in the model are held constant. In your case, a one unit change would go from 3.390 to 4.390, almost the entire range. What is the estimate for log_X? Is it a large number?
Can you provide some of the other output?
Sorry for the late reply.
The estimate for log_X is 10.3647 with a 95% CI of 7.3216-13.4078
Included more output below
I tried KSharp's suggestion of including a Units statement, which yields actual values (79.685, 95% CI 24.050-318.803). As you said, do you think it's so high due to the spread of the data for log_X (I.e. only encompassing a 1-unit change) as opposed to log_Y or could something else be going on to drive it?
Interesting!
There is a paper that describes a problem similar to yours. They suggest the solution is to use Penalised Logistic Regression using the Firth option. Perhaps this is worth a try.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4944325/
Yeah, it seemed odd to me, I've never seen this issue with a continuous variable like this with a sample size at a decent size of this dataset.
Thanks so much for that suggestion, I'll take a look now and will try to run it!
Just ran
proc logistic data = WORK.disease;
model strata1(event='1') = log_X / firth;
run;
proc logistic data = WORK.disease;
model strata1(event='1') = log_X;
run;
Still getting the ">999.99" issue. I can include a unit statement, but the Firth model (unless my code is wrong) didn't seem to do much.
Thanks for trying. It looks like KSHARPs solution is best:
units log_X=0.1 ;
units log_X=2*SD ;
Thanks for the suggestion, when I include a units statements I do get actual values for log_x OR: 79.685 with 95% CI 24.050-318.803
Basically it is a combination of your data and the model. See the article "Formats for p-values and odds ratios in SAS." On that page, search for the phrase "let's try to understand why the odds ratio is so extreme," which will take you near the end of the article.
Thanks for the heads up! In the case presented on that link, there were a severely limited number of data points (3 observations). But for the continuous variable log_X there are no missing data points (n=236) without any clear outliers. Despite having a similar profile, Log_Y doesn't yield such an extreme OR.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.