Solved: Glimmix: Same method with different optimization results different est...

Kinga · Posted 08-12-2014 07:16 AM

Hi, I tried to figure out what can be the reason that in proc glimmix using the same ML method results different estimate and variance depending on the optimization technic. The model uses two fixed effects, two random effects and one dummy (class) variable.

In both scenarios I used the Laplace method, but different optimization technic (Newton-Raphson, quasi-Newton).

proc glimmix data = data method=LAPLACE;

class dummy random1 random2;

model flag (event ='1') = fixed1 fixed2 dummy /link=probit dist=binomial solution or;

output out=glimmixout pred( blup ilink)=PredProb

pred(noblup ilink)=PredProd_PA;

random int /subject= random1 solution;

random int /subject= random2 solution;

nloptions om=newrap;

run;

proc glimmix data = data method=LAPLACE;

class dummy random1 random2;

model flag (event ='1') = fixed1 fixed2 dummy /link=probit dist=binomial solution or;

output out=glimmixout pred( blup ilink)=PredProb

pred(noblup ilink)=PredProd_PA;

random int /subject= random1 solution;

random int /subject= random2 solution;

nloptions om=quanew;

run;

The results are slightly different:

Newton-Raphson:

Intercept _ -2.65210 0.1795 8 -14.77 0.0000004336

fixed1 _ -.631697 0.01484 201E3 -42.56 <.0000000001

fixed2 _ -.097351 0.01102 201E3 -8.83 <.0000000001

Dummy 1 -.775746 0.09144 201E3 -8.48 <.0000000001

Dummy 2 -.677288 0.05578 201E3 -12.14 <.0000000001

Dummy 3 -.422331 0.03932 201E3 -10.74 <.0000000001

Dummy 4 0.071574 0.02895 201E3 2.47 0.0134390943

Dummy 5 0.800703 0.03233 201E3 24.77 <.0000000001

Dummy 6 1.728605 0.07865 201E3 21.98 <.0000000001

Dummy 7 0.000000 . . . .

Quasi-Newton:

Intercept _ -2.65845 0.1816 8 -14.64 0.0000004659

fixed1 _ -.632236 0.01485 201E3 -42.58 <.0000000001

fixed2 _ -.097346 0.01102 201E3 -8.83 <.0000000001

Dummy 1 -.774482 0.09140 201E3 -8.47 <.0000000001

Dummy 2 -.677044 0.05579 201E3 -12.13 <.0000000001

Dummy 3 -.421267 0.03932 201E3 -10.72 <.0000000001

Dummy 4 0.071657 0.02896 201E3 2.47 0.0133518470

Dummy 5 0.800642 0.03233 201E3 24.76 <.0000000001

Dummy 6 1.728683 0.07866 201E3 21.98 <.0000000001

Dummy 7 0.000000 . . . .

Is there any reason why I don’t get the same results for the two scenarios?

Thanks

SteveDenham · Posted 08-12-2014 07:53 AM

The optimization methods (Newton-Raphson and quasi-Newton ) have some fundamental differences, most of which I think involve inverting the Hessian (newrap) vs. updating it without inversion (quanew). Differences are more likely to show up in standard errors of estimates than in the point estimates themselves, but what you are seeing is, at least to me, not unexpected. Because the nature of maximum likelihood optimization is iterative with a stopping rule based on whatever preset criteria are specified, you can end up in a neighborhood where the convergence is said to have occurred, but with slightly different estimates. Check out the results from other optimization methods, and you should see that all give different results, including the -2 log likelihood. Saying that one or the other is correct, or more correct, is like saying your location is more or less correct using different GPS devices.

Steve Denham

View solution in original post

SteveDenham · Posted 08-12-2014 07:53 AM

The optimization methods (Newton-Raphson and quasi-Newton ) have some fundamental differences, most of which I think involve inverting the Hessian (newrap) vs. updating it without inversion (quanew). Differences are more likely to show up in standard errors of estimates than in the point estimates themselves, but what you are seeing is, at least to me, not unexpected. Because the nature of maximum likelihood optimization is iterative with a stopping rule based on whatever preset criteria are specified, you can end up in a neighborhood where the convergence is said to have occurred, but with slightly different estimates. Check out the results from other optimization methods, and you should see that all give different results, including the -2 log likelihood. Saying that one or the other is correct, or more correct, is like saying your location is more or less correct using different GPS devices.

Steve Denham

Kinga · Posted 08-12-2014 10:07 AM

Hi Steve,

Thanks for the answer. Indeed I forgot to copythe most important part. Mostly the strange bit is that the log likelihood looks the same. It can be a coindidence, although it is not really possible that you get the same likelihood, but the estimates are different. That why I'm curious what can be the reason.

Newton-Raphson:

Fit Statistics

-2 Log Likelihood 25015.43

AIC (smaller is better) 25037.43

AICC (smaller is better) 25037.43

BIC (smaller is better) 25015.43

CAIC (smaller is better) 25026.43

Quasi-Newton:

Fit Statistics

-2 Log Likelihood 25015.43

AIC (smaller is better) 25037.43

AICC (smaller is better) 25037.43

BIC (smaller is better) 25015.43

CAIC (smaller is better) 25026.43

HQIC (smaller is better) 25015.43

Thanks,

Kinga

SteveDenham · Posted 08-12-2014 10:41 AM

Yes, rounded to two decimal places the -2LL values look the same, but then so do the parameter estimates. If you output the iteration history to a dataset (ods output iterhistory=iterhistory) and look at the objective function values and gradients, it should point out what is going on. I am guessing that over in the third or fourth decimal place you will start to see differences. Also, take a look at the Hessian matrix, and there should be small differences there as well.

One way of getting the parameter values closer to one another might be to use the PCONV= option in the PROC GLIMMIX statement. You would probably also have to override the other convergence criteria as well, making them all stricter in an NLOPTIONS statement, and increase the maximum number of iterations.

Steve Denham.

Kinga · Posted 08-13-2014 06:29 AM

Thanks for the answer; indeed the decimal places were different.

As far as I understood if I want to make them closer I should change the convergence criteria, but as I see it is not possible to define the criteria based on the change in the likelihood value. Am I right or is there any way to set convergence criteria for the change in the likelihood value (such as if it is less than 1e-6, then stop)?

Thanks again.

SteveDenham · Posted 08-13-2014 11:21 AM

I am not absolutely sure but I think the objective function convergence is controlled with the FCONV or FCONV2 option in an NLOPTIONS statement. Now if convergence is reached on one of the other criterion (GCONV and PCONV) then changing the FCONV level would not have an effect.

Did some quick checking, and perhaps the ABSFCONV would be in order.

Steve Denham

Message was edited by: Steve Denham

Glimmix: Same method with different optimization results different estimates

Re: Glimmix: Same method with different optimization results different estimates

Re: Glimmix: Same method with different optimization results different estimates

Re: Glimmix: Same method with different optimization results different estimates

Re: Glimmix: Same method with different optimization results different estimates

Re: Glimmix: Same method with different optimization results different estimates

Re: Glimmix: Same method with different optimization results different estimates