Statistical Procedures

baseballyanks1 · Posted 08-13-2020 11:09 PM

Trying to figure out why I'm getting an absurd OR for a continuous variable log_X that does not have any missing values. Any ideas?

proc logistic data=WORK.DISEASE_MM;
	class Gender1 (ref="0") / param=glm;
	model Strata1 (event='1')= gender1 log_X log_Y age/ link=logit clparm=both clodds=both alpha=0.05 
		technique=fisher scale=none aggregate lackfit;
run;

Odds Ratio Estimates and Profile-Likelihood Confidence Interverals

Effect Estimate 95% CI

gender1 1 vs 0 7.321 3.514 16.152

log_Y 0.617 0.075 5.004

log_X >999.999 >999.999 >999.999

Age 0.926 0.894 0.955

Norman21 · Posted 08-14-2020 01:51 AM

Welcome to the Community!

This looks like it could be a problem with the data, such as an outlier. Can you provide more information?

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

baseballyanks1 · Posted 08-14-2020 07:14 AM

Thanks for the welcome message! 😄 Long time visitor, first time poster.

That's what I was thinking too, about an outlier, but the data range for log_X is 3.390 to 4.421 (compared to log_Y, 1.853 to 3.346).

Happy to provide any other information needed!

Norman21 · Posted 08-14-2020 07:31 AM

The estimate of the logistic regression coefficient is for a one unit change in log_X score, given the other variables in the model are held constant. In your case, a one unit change would go from 3.390 to 4.390, almost the entire range. What is the estimate for log_X? Is it a large number?

Can you provide some of the other output?

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

baseballyanks1 · Posted 08-15-2020 10:41 AM

Sorry for the late reply.

The estimate for log_X is 10.3647 with a 95% CI of 7.3216-13.4078

Included more output below

SAS output.png

baseballyanks1 · Posted 08-15-2020 10:48 AM

I tried KSharp's suggestion of including a Units statement, which yields actual values (79.685, 95% CI 24.050-318.803). As you said, do you think it's so high due to the spread of the data for log_X (I.e. only encompassing a 1-unit change) as opposed to log_Y or could something else be going on to drive it?

Norman21 · Posted 08-15-2020 01:29 PM

Interesting!

There is a paper that describes a problem similar to yours. They suggest the solution is to use Penalised Logistic Regression using the Firth option. Perhaps this is worth a try.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4944325/

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_examples15.htm&docsetVer...

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

baseballyanks1 · Posted 08-15-2020 03:29 PM

Yeah, it seemed odd to me, I've never seen this issue with a continuous variable like this with a sample size at a decent size of this dataset.

Thanks so much for that suggestion, I'll take a look now and will try to run it!

baseballyanks1 · Posted 08-15-2020 04:36 PM

Just ran

proc logistic data = WORK.disease;
model strata1(event='1') = log_X / firth;
run;

proc logistic data = WORK.disease;
model strata1(event='1') = log_X;
run;

Still getting the ">999.99" issue. I can include a unit statement, but the Firth model (unless my code is wrong) didn't seem to do much.

Screen Shot 2020-08-15 at 4.34.10 PM.png Screen Shot 2020-08-15 at 4.33.48 PM.png

Norman21 · Posted 08-16-2020 04:14 AM

Thanks for trying. It looks like KSHARPs solution is best:

units log_X=0.1 ;
units log_X=2*SD ;

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Ksharp · Posted 08-14-2020 08:02 AM

What is your logistic regression coefficient of log_X (beta)?
I guess it is very big due to odds ratio= e^beta .

Ksharp · Posted 08-14-2020 08:13 AM

Also try UNITS statement to adjust unit of odds ratio :

units log_X=0.1 ;
units log_X=2*SD ;

baseballyanks1 · Posted 08-15-2020 10:45 AM

Thanks for the suggestion, when I include a units statements I do get actual values for log_x OR: 79.685 with 95% CI 24.050-318.803

Rick_SAS · Posted 08-15-2020 11:19 AM

Basically it is a combination of your data and the model. See the article "Formats for p-values and odds ratios in SAS." On that page, search for the phrase "let's try to understand why the odds ratio is so extreme," which will take you near the end of the article.

baseballyanks1 · Posted 08-15-2020 11:27 AM

Thanks for the heads up! In the case presented on that link, there were a severely limited number of data points (3 observations). But for the continuous variable log_X there are no missing data points (n=236) without any clear outliers. Despite having a similar profile, Log_Y doesn't yield such an extreme OR.

Statistical Procedures

re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Re: re: ">999.999" odds ratio in logistic regression model

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...