BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Johndberglund
Fluorite | Level 6

PROC LOGISTIC sometimes lacks accuracy.

I tried checking the intercept and coefficients given by PROC LOGISTIC to verify that we have maximized the log-likelihood. Often I find that the PROC LOGISTIC results are not accurate in the last digit or two. It seems to me that you could just give us fewer digits if you are unsure of the last digits.

To see what I mean run this:

data tiny;
input x y;
lines;
0 0
0 0
1 1
2 0
4 0
5 1
7 0
8 1
9 1
10 1
;
run;

proc logistic data = tiny;
model y(event='1')=x;
effectplot;
run;

/*
Gives:
-2 Log L 13.863 10.447

Intercept 1 -1.7308 1.3194 1.7209 0.1896
x 1 0.3790 0.2384 2.5276 0.1119

So we grab the intercept and coefficient for the next part.

*/

data check1;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7308;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/* gives
final sumLL =-5.223346937613210
and pred = 10.446693875226400.
Here pred is the prediction of -2 Log L.
It matches the 10.447 given by PROC LOGISTIC
*/

* run the identical thing, but change the intercept;

data check2;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7306;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/*
Gives
final sumLL =-5.223346904989050
and pred = 10.446693809978100

This final sumLL is larger than the results given by PROC LOGISTIC.
The predicted -2 Log L is smaller - meaning a better fit.

So we should have gotten intercept -1.7306 instead of -1.7308, as given by PROC LOGISTIC.

I asked to see what precision we are measuring the numbers with. My machine gives 14 digits - so differences in the 8th place shouldn't be due to roundoff error.

Could someone explain to me why SAS does this?
*/

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

If you want to focus on the log likelihood, then you should use a convergence criterion that also focuses on the log likelihood. The default convergence criterion, GCONV, focuses on the gradients. If you simply change to the FCONV criterion, which focuses on the log likelihood, then you will again get a smaller log likelihood than for the case you mention. 

proc logistic data = tiny outest=oe;
model y(event='1')=x / gconv=0 fconv=1e-8;
run;
data ine;
set oe;
intercept=-1.7311; x=.3791;
run;
proc logistic data = tiny inest=ine outest=oe2;
model y(event='1')=x / maxiter=0 itprint;
run;
proc print data=oe;
format _lnlike_ 20.16;
run;
proc print data=oe2;
format _lnlike_ 20.16;
run;

View solution in original post

9 REPLIES 9
Reeza
Super User

You're concerned about differences at the 8th+ decimal place?

 

Is your data measured to that level of accuracy?

 

This page documents the Numerical Precision issue in SAS/computers.

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p0ji1unv6thm0dn1gp4t01a1u0g6.htm&docsetVe....

 

PS. It really helps if you take a few minutes to format your code and post to make it more legible.

 


@Johndberglund wrote:

PROC LOGISTIC sometimes lacks accuracy.

I tried checking the intercept and coefficients given by PROC LOGISTIC to verify that we have maximized the log-likelihood. Often I find that the PROC LOGISTIC results are not accurate in the last digit or two. It seems to me that you could just give us fewer digits if you are unsure of the last digits.

To see what I mean run this:

data tiny;
input x y;
lines;
0 0
0 0
1 1
2 0
4 0
5 1
7 0
8 1
9 1
10 1
;
run;

proc logistic data = tiny;
model y(event='1')=x;
effectplot;
run;

/*
Gives:
-2 Log L 13.863 10.447

Intercept 1 -1.7308 1.3194 1.7209 0.1896
x 1 0.3790 0.2384 2.5276 0.1119

So we grab the intercept and coefficient for the next part.

*/

data check1;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7308;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/* gives
final sumLL =-5.223346937613210
and pred = 10.446693875226400.
Here pred is the prediction of -2 Log L.
It matches the 10.447 given by PROC LOGISTIC
*/

* run the identical thing, but change the intercept;

data check2;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7306;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/*
Gives
final sumLL =-5.223346904989050
and pred = 10.446693809978100

This final sumLL is larger than the results given by PROC LOGISTIC.
The predicted -2 Log L is smaller - meaning a better fit.

So we should have gotten intercept -1.7306 instead of -1.7308, as given by PROC LOGISTIC.

I asked to see what precision we are measuring the numbers with. My machine gives 14 digits - so differences in the 8th place shouldn't be due to roundoff error.

Could someone explain to me why SAS does this?
*/

 

 

 


 

Johndberglund
Fluorite | Level 6

Ah. There I see where I can format. Copy and paste wasn't doing it.

Thanks for your quick responses.

I don't really care about accuracy to the 8th place. I wanted the four decimal digits given to be accurate. And they are at least close...

Reeza
Super User
Have you tried changing the convergence criteria?
If none of the criteria is specified, the default is GCONV=1E–8.
https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...
Johndberglund
Fluorite | Level 6
Both these give the same answer:

proc logistic data = tiny;
model y(event='1')=x/xconv= 1E-8 ;
effectplot;
run;

proc logistic data = tiny;
model y(event='1')=x/gconv=1E-12;
effectplot;
run;

which is
int = -1.7310;
coeff = 0.3791;

This is an improvement on the original numbers.

However this can be further improved by
int = -1.7311;
coeff = 0.3791;

I haven't tried all the numbers in this general area. It seems like this
would be what I'd expect the procedure to do for me.
SAS_Rob
SAS Employee

Maybe I am not grasping what you are trying to show, but when I take the difference between the LL of the fitted model and your modified parameter data set, I do get a slighly higher LL for the fitted model suggesting it is the MLE.

 

data tiny;
input x y;
lines;
0 0
0 0
1 1
2 0
4 0
5 1
7 0
8 1
9 1
10 1
;
run;

proc logistic data = tiny outest=out1;
model y(event='1')=x;
effectplot;
run;

data _null_;
set out1;
call symput('fitll',_LNLIKE_);
run;

data check2;
set tiny end=lastOne;
format pi_x LL sumLL pred 18.15 diff 18.15;
int = -1.7306;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
diff=&fitll-sumll;*there is a positive difference here;
*keep pred int coeff;
run;

proc print;
run;

 

StatDave
SAS Super FREQ

This might make it easier to see that the solution from PROC LOGISTIC is better.

proc logistic data = tiny outest=oe;
model y(event='1')=x;
run;
data ine;
set oe;
intercept=-1.7306;
run;
proc logistic data = tiny inest=ine outest=oe2;
model y(event='1')=x / maxiter=0 itprint;
run;
proc print data=oe;
format _lnlike_ 20.16;
run;
proc print data=oe2;
format _lnlike_ 20.16;
run;
Johndberglund
Fluorite | Level 6

I like how you showed a nice way to compare the two answers. I think that perhaps what I want is not possible.

When I see the coefficients given with four digits, I'm expecting this to be the closest four digit numbers to the "best" solution. Since we are probably iterating some system of equations, we are never 100% certain. I would like to know more about what range they are sure that the MLE is in.

 

If we copy your program, but change it to this, we again get an improvement on the first answer SAS gives.

data ine;
set oe;
intercept=-1.7311;
x=0.3791;
run;

StatDave
SAS Super FREQ

If you want to focus on the log likelihood, then you should use a convergence criterion that also focuses on the log likelihood. The default convergence criterion, GCONV, focuses on the gradients. If you simply change to the FCONV criterion, which focuses on the log likelihood, then you will again get a smaller log likelihood than for the case you mention. 

proc logistic data = tiny outest=oe;
model y(event='1')=x / gconv=0 fconv=1e-8;
run;
data ine;
set oe;
intercept=-1.7311; x=.3791;
run;
proc logistic data = tiny inest=ine outest=oe2;
model y(event='1')=x / maxiter=0 itprint;
run;
proc print data=oe;
format _lnlike_ 20.16;
run;
proc print data=oe2;
format _lnlike_ 20.16;
run;
Johndberglund
Fluorite | Level 6

Thank you all for your help.  I see that my error was unconsciously expecting them to use absfconv=1e-6 (or whatever) when computing the coefficients. This came up with the same answer as fconv=1e-8 that you gave which is:

 

intercept = -1.7310180741539600

x = 0.3790914824860400

_LNLIKE_ = -5.2233468313521700

 

Then I would have expected them to round off to whatever length they wanted - like

-1.7310 and 0.3791

But it's all an approximation, so we'll live with it.

 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1346 views
  • 9 likes
  • 4 in conversation