BookmarkSubscribeRSS Feed
baseballyanks1
Calcite | Level 5

Trying to figure out why I'm getting an absurd OR for a continuous variable log_X that does not have any missing values. Any ideas?

 

proc logistic data=WORK.DISEASE_MM;
class Gender1 (ref="0") / param=glm;
model Strata1 (event='1')= gender1 log_X log_Y age/ link=logit clparm=both clodds=both alpha=0.05
technique=fisher scale=none aggregate lackfit;
run;

 

Odds Ratio Estimates and Profile-Likelihood Confidence Interverals

 

Effect                      Estimate        95% CI

gender1 1 vs 0        7.321            3.514    16.152

log_Y                      0.617            0.075     5.004

 log_X                    >999.999      >999.999    >999.999

Age                        0.926             0.894      0.955

17 REPLIES 17
Norman21
Lapis Lazuli | Level 10

Welcome to the Community!

 

This looks like it could be a problem with the data, such as an outlier. Can you provide more information?

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

baseballyanks1
Calcite | Level 5

Thanks for the welcome message! 😄 Long time visitor, first time poster.

 

That's what I was thinking too, about an outlier, but the data range for log_X is 3.390 to 4.421 (compared to log_Y, 1.853 to 3.346). 

 

Happy to provide any other information needed! 

Norman21
Lapis Lazuli | Level 10

The estimate of the logistic regression coefficient is for a one unit change in log_X score, given the other variables in the model are held constant. In your case, a one unit change would go from 3.390 to 4.390, almost the entire range. What is the estimate for log_X? Is it a large number?

 

Can you provide some of the other output?

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

baseballyanks1
Calcite | Level 5

Sorry for the late reply. 

 

The estimate for log_X is 10.3647 with a 95% CI of 7.3216-13.4078

 

Included more output below

 

 

SAS output.png

baseballyanks1
Calcite | Level 5

I tried KSharp's suggestion of including a Units statement, which yields actual values (79.685, 95% CI 24.050-318.803). As you said, do you think it's so high due to the spread of the data for log_X  (I.e. only encompassing a 1-unit change) as opposed to log_Y or could something else be going on to drive it?

Norman21
Lapis Lazuli | Level 10

Interesting!

 

There is a paper that describes a problem similar to yours. They suggest the solution is to use Penalised Logistic Regression using the Firth option. Perhaps this is worth a try.

 

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4944325/

 

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_examples15.htm&docsetVer...

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

baseballyanks1
Calcite | Level 5

Yeah, it seemed odd to me, I've never seen this issue with a continuous variable like this with a sample size at a decent size of this dataset.

 

Thanks so much for that suggestion, I'll take a look now and will try to run it! 

baseballyanks1
Calcite | Level 5

Just ran 

 

proc logistic data = WORK.disease;
model strata1(event='1') = log_X / firth;
run;

proc logistic data = WORK.disease;
model strata1(event='1') = log_X;
run;

Still getting the ">999.99" issue. I can include a unit statement, but the Firth model (unless my code is wrong) didn't seem to do much.

 

Screen Shot 2020-08-15 at 4.34.10 PM.pngScreen Shot 2020-08-15 at 4.33.48 PM.png 

Norman21
Lapis Lazuli | Level 10

Thanks for trying. It looks like KSHARPs solution is best:

 

units log_X=0.1 ;
units log_X=2*SD ;
Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Ksharp
Super User
What is your logistic regression coefficient of log_X (beta)?
I guess it is very big due to odds ratio= e^beta .
Ksharp
Super User
Also try UNITS statement to adjust unit of odds ratio :

units log_X=0.1 ;
units log_X=2*SD ;
baseballyanks1
Calcite | Level 5

Thanks for the suggestion, when I include a units statements I do get actual values for log_x OR: 79.685 with 95% CI 24.050-318.803

Rick_SAS
SAS Super FREQ

Basically it is a combination of your data and the model.  See the article "Formats for p-values and odds ratios in SAS." On that page, search for the phrase "let's try to understand why the odds ratio is so extreme," which will take you near the end of the article.

baseballyanks1
Calcite | Level 5

Thanks for the heads up! In the case presented on that link, there were a severely limited number of data points (3 observations). But for the continuous variable log_X there are no missing data points (n=236) without any clear outliers. Despite having a similar profile, Log_Y doesn't yield such an extreme OR. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 5532 views
  • 1 like
  • 5 in conversation