Solved: PROC LOGISTIC error

Johndberglund · Posted 02-09-2021 03:25 PM

PROC LOGISTIC sometimes lacks accuracy.

I tried checking the intercept and coefficients given by PROC LOGISTIC to verify that we have maximized the log-likelihood. Often I find that the PROC LOGISTIC results are not accurate in the last digit or two. It seems to me that you could just give us fewer digits if you are unsure of the last digits.

To see what I mean run this:

data tiny;
input x y;
lines;
0 0
0 0
1 1
2 0
4 0
5 1
7 0
8 1
9 1
10 1
;
run;

proc logistic data = tiny;
model y(event='1')=x;
effectplot;
run;

/*
Gives:
-2 Log L 13.863 10.447

Intercept 1 -1.7308 1.3194 1.7209 0.1896
x 1 0.3790 0.2384 2.5276 0.1119

So we grab the intercept and coefficient for the next part.

*/

data check1;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7308;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/* gives
final sumLL =-5.223346937613210
and pred = 10.446693875226400.
Here pred is the prediction of -2 Log L.
It matches the 10.447 given by PROC LOGISTIC
*/

* run the identical thing, but change the intercept;

data check2;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7306;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/*
Gives
final sumLL =-5.223346904989050
and pred = 10.446693809978100

This final sumLL is larger than the results given by PROC LOGISTIC.
The predicted -2 Log L is smaller - meaning a better fit.

So we should have gotten intercept -1.7306 instead of -1.7308, as given by PROC LOGISTIC.

I asked to see what precision we are measuring the numbers with. My machine gives 14 digits - so differences in the 8th place shouldn't be due to roundoff error.

Could someone explain to me why SAS does this?
*/

StatDave · Posted 02-10-2021 10:58 PM

If you want to focus on the log likelihood, then you should use a convergence criterion that also focuses on the log likelihood. The default convergence criterion, GCONV, focuses on the gradients. If you simply change to the FCONV criterion, which focuses on the log likelihood, then you will again get a smaller log likelihood than for the case you mention.

proc logistic data = tiny outest=oe;
model y(event='1')=x / gconv=0 fconv=1e-8;
run;
data ine;
set oe;
intercept=-1.7311; x=.3791;
run;
proc logistic data = tiny inest=ine outest=oe2;
model y(event='1')=x / maxiter=0 itprint;
run;
proc print data=oe;
format _lnlike_ 20.16;
run;
proc print data=oe2;
format _lnlike_ 20.16;
run;

View solution in original post

Reeza · Posted 02-09-2021 03:36 PM

You're concerned about differences at the 8th+ decimal place?

Is your data measured to that level of accuracy?

This page documents the Numerical Precision issue in SAS/computers.

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p0ji1unv6thm0dn1gp4t01a1u0g6.htm&docsetVe....

PS. It really helps if you take a few minutes to format your code and post to make it more legible.

@Johndberglund wrote:

PROC LOGISTIC sometimes lacks accuracy.

I tried checking the intercept and coefficients given by PROC LOGISTIC to verify that we have maximized the log-likelihood. Often I find that the PROC LOGISTIC results are not accurate in the last digit or two. It seems to me that you could just give us fewer digits if you are unsure of the last digits.

To see what I mean run this:

data tiny;
input x y;
lines;
0 0
0 0
1 1
2 0
4 0
5 1
7 0
8 1
9 1
10 1
;
run;

proc logistic data = tiny;
model y(event='1')=x;
effectplot;
run;

/*
Gives:
-2 Log L 13.863 10.447

Intercept 1 -1.7308 1.3194 1.7209 0.1896
x 1 0.3790 0.2384 2.5276 0.1119

So we grab the intercept and coefficient for the next part.

*/

data check1;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7308;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/* gives
final sumLL =-5.223346937613210
and pred = 10.446693875226400.
Here pred is the prediction of -2 Log L.
It matches the 10.447 given by PROC LOGISTIC
*/

* run the identical thing, but change the intercept;

data check2;
set tiny end=lastOne;
* show me max precision +1 on Windows 10 machine;
format pi_x LL sumLL pred 18.15;
int = -1.7306;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
*keep pred int coeff;
run;

/*
Gives
final sumLL =-5.223346904989050
and pred = 10.446693809978100

This final sumLL is larger than the results given by PROC LOGISTIC.
The predicted -2 Log L is smaller - meaning a better fit.

So we should have gotten intercept -1.7306 instead of -1.7308, as given by PROC LOGISTIC.

I asked to see what precision we are measuring the numbers with. My machine gives 14 digits - so differences in the 8th place shouldn't be due to roundoff error.

Could someone explain to me why SAS does this?
*/

Johndberglund · Posted 02-10-2021 08:29 PM

Ah. There I see where I can format. Copy and paste wasn't doing it.

Thanks for your quick responses.

I don't really care about accuracy to the 8th place. I wanted the four decimal digits given to be accurate. And they are at least close...

Reeza · Posted 02-09-2021 03:48 PM

Have you tried changing the convergence criteria?
If none of the criteria is specified, the default is GCONV=1E–8.
https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statu...

Johndberglund · Posted 02-09-2021 08:30 PM

Both these give the same answer:

proc logistic data = tiny;
model y(event='1')=x/xconv= 1E-8 ;
effectplot;
run;

proc logistic data = tiny;
model y(event='1')=x/gconv=1E-12;
effectplot;
run;

which is
int = -1.7310;
coeff = 0.3791;

This is an improvement on the original numbers.

However this can be further improved by
int = -1.7311;
coeff = 0.3791;

I haven't tried all the numbers in this general area. It seems like this
would be what I'd expect the procedure to do for me.

SAS_Rob · Posted 02-09-2021 04:39 PM

Maybe I am not grasping what you are trying to show, but when I take the difference between the LL of the fitted model and your modified parameter data set, I do get a slighly higher LL for the fitted model suggesting it is the MLE.

data tiny;
input x y;
lines;
0 0
0 0
1 1
2 0
4 0
5 1
7 0
8 1
9 1
10 1
;
run;

proc logistic data = tiny outest=out1;
model y(event='1')=x;
effectplot;
run;

data _null_;
set out1;
call symput('fitll',_LNLIKE_);
run;

data check2;
set tiny end=lastOne;
format pi_x LL sumLL pred 18.15 diff 18.15;
int = -1.7306;
coeff = .3790;
pi_x = exp(int+coeff*x)/(1+exp(int+coeff*x));
LL = y*log(pi_x)+ (1-y)*log(1-pi_x);
sumLL = sum(sumLL,LL);
if lastOne then do;
pred=-2*sumLL;
*output;
end;
retain sumLL;
diff=&fitll-sumll;*there is a positive difference here;
*keep pred int coeff;
run;

proc print;
run;

StatDave · Posted 02-09-2021 10:56 PM

This might make it easier to see that the solution from PROC LOGISTIC is better.

proc logistic data = tiny outest=oe;
model y(event='1')=x;
run;
data ine;
set oe;
intercept=-1.7306;
run;
proc logistic data = tiny inest=ine outest=oe2;
model y(event='1')=x / maxiter=0 itprint;
run;
proc print data=oe;
format _lnlike_ 20.16;
run;
proc print data=oe2;
format _lnlike_ 20.16;
run;

Johndberglund · Posted 02-10-2021 10:25 PM

I like how you showed a nice way to compare the two answers. I think that perhaps what I want is not possible.

When I see the coefficients given with four digits, I'm expecting this to be the closest four digit numbers to the "best" solution. Since we are probably iterating some system of equations, we are never 100% certain. I would like to know more about what range they are sure that the MLE is in.

If we copy your program, but change it to this, we again get an improvement on the first answer SAS gives.

data ine;
set oe;
intercept=-1.7311;
x=0.3791;
run;

StatDave · Posted 02-10-2021 10:58 PM

If you want to focus on the log likelihood, then you should use a convergence criterion that also focuses on the log likelihood. The default convergence criterion, GCONV, focuses on the gradients. If you simply change to the FCONV criterion, which focuses on the log likelihood, then you will again get a smaller log likelihood than for the case you mention.

proc logistic data = tiny outest=oe;
model y(event='1')=x / gconv=0 fconv=1e-8;
run;
data ine;
set oe;
intercept=-1.7311; x=.3791;
run;
proc logistic data = tiny inest=ine outest=oe2;
model y(event='1')=x / maxiter=0 itprint;
run;
proc print data=oe;
format _lnlike_ 20.16;
run;
proc print data=oe2;
format _lnlike_ 20.16;
run;

Johndberglund · Posted 02-11-2021 08:29 PM

Thank you all for your help. I see that my error was unconsciously expecting them to use absfconv=1e-6 (or whatever) when computing the coefficients. This came up with the same answer as fconv=1e-8 that you gave which is:

intercept = -1.7310180741539600

x = 0.3790914824860400

_LNLIKE_ = -5.2233468313521700

Then I would have expected them to round off to whatever length they wanted - like

-1.7310 and 0.3791

But it's all an approximation, so we'll live with it.

PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

Re: PROC LOGISTIC error

SAS Innovate 2025: Call for Content