BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kinga
Calcite | Level 5

Hi, I tried to figure out what can be the reason that in proc glimmix using the same ML method results different estimate and variance depending on the optimization technic. The model uses two fixed effects, two random effects and one dummy (class) variable.

In both scenarios I used the Laplace method, but different optimization technic (Newton-Raphson, quasi-Newton).

proc glimmix data = data method=LAPLACE;

class dummy random1 random2;

model flag (event ='1') = fixed1 fixed2 dummy /link=probit dist=binomial solution or;

output out=glimmixout pred( blup ilink)=PredProb

                     pred(noblup ilink)=PredProd_PA;

random int /subject= random1 solution;

random int /subject= random2 solution;

nloptions om=newrap;

run;

proc glimmix data = data method=LAPLACE;

class dummy random1 random2;

model flag (event ='1') = fixed1 fixed2 dummy /link=probit dist=binomial solution or;

output out=glimmixout pred( blup ilink)=PredProb

                     pred(noblup ilink)=PredProd_PA;

random int /subject= random1 solution;

random int /subject= random2 solution;

nloptions om=quanew;

run;

The results are slightly different:

Newton-Raphson:

     Intercept               _    -2.65210      0.1795        8     -14.77    0.0000004336

     fixed1                  _    -.631697     0.01484    201E3     -42.56    <.0000000001

     fixed2                  _    -.097351     0.01102    201E3      -8.83    <.0000000001

     Dummy                   1    -.775746     0.09144    201E3      -8.48    <.0000000001

     Dummy                   2    -.677288     0.05578    201E3     -12.14    <.0000000001

     Dummy                   3    -.422331     0.03932    201E3     -10.74    <.0000000001

     Dummy                   4    0.071574     0.02895    201E3       2.47    0.0134390943

     Dummy                   5    0.800703     0.03233    201E3      24.77    <.0000000001

     Dummy                   6    1.728605     0.07865    201E3      21.98    <.0000000001

     Dummy                   7    0.000000           .        .        .       .

Quasi-Newton:

     Intercept               _    -2.65845      0.1816        8     -14.64    0.0000004659

     fixed1                  _    -.632236     0.01485    201E3     -42.58    <.0000000001

     fixed2                  _    -.097346     0.01102    201E3      -8.83    <.0000000001

     Dummy                   1    -.774482     0.09140    201E3      -8.47    <.0000000001

     Dummy                   2    -.677044     0.05579    201E3     -12.13    <.0000000001

     Dummy                   3    -.421267     0.03932    201E3     -10.72    <.0000000001

     Dummy                   4    0.071657     0.02896    201E3       2.47    0.0133518470

     Dummy                   5    0.800642     0.03233    201E3      24.76    <.0000000001

     Dummy                   6    1.728683     0.07866    201E3      21.98    <.0000000001

     Dummy                   7    0.000000           .        .        .       .

Is there any reason why I don’t get the same results for the two scenarios?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

The optimization methods (Newton-Raphson and quasi-Newton ) have some fundamental differences, most of which I think involve inverting the Hessian (newrap) vs. updating it without inversion (quanew).  Differences are more likely to show up in standard errors of estimates than in the point estimates themselves, but what you are seeing is, at least to me, not unexpected.  Because the nature of maximum likelihood optimization is iterative with a stopping rule based on whatever preset criteria are specified, you can end up in a neighborhood where the convergence is said to have occurred, but with slightly different estimates.  Check out the results from other optimization methods, and you should see that all give different results, including the -2 log likelihood.  Saying that one or the other is correct, or more correct, is like saying your location is more or less correct using different GPS devices.

Steve Denham

View solution in original post

5 REPLIES 5
SteveDenham
Jade | Level 19

The optimization methods (Newton-Raphson and quasi-Newton ) have some fundamental differences, most of which I think involve inverting the Hessian (newrap) vs. updating it without inversion (quanew).  Differences are more likely to show up in standard errors of estimates than in the point estimates themselves, but what you are seeing is, at least to me, not unexpected.  Because the nature of maximum likelihood optimization is iterative with a stopping rule based on whatever preset criteria are specified, you can end up in a neighborhood where the convergence is said to have occurred, but with slightly different estimates.  Check out the results from other optimization methods, and you should see that all give different results, including the -2 log likelihood.  Saying that one or the other is correct, or more correct, is like saying your location is more or less correct using different GPS devices.

Steve Denham

Kinga
Calcite | Level 5

Hi Steve,

Thanks for the answer. Indeed I forgot to copythe most important part. Mostly the strange bit is that the log likelihood looks the same. It can be a coindidence, although it is not really possible that you get the same likelihood, but the estimates are different. That why I'm curious what can be the reason.

Newton-Raphson:

 

Fit Statistics

-2 Log Likelihood 25015.43

AIC (smaller is better) 25037.43

AICC (smaller is better) 25037.43

BIC (smaller is better) 25015.43

CAIC (smaller is better) 25026.43

  Quasi-Newton:

Fit Statistics

-2 Log Likelihood 25015.43

AIC (smaller is better) 25037.43

AICC (smaller is better) 25037.43

BIC (smaller is better) 25015.43

CAIC (smaller is better) 25026.43

HQIC (smaller is better) 25015.43

Thanks,

Kinga

SteveDenham
Jade | Level 19

Yes, rounded to two decimal places the -2LL values look the same, but then so do the parameter estimates.  If you output the iteration history to a dataset (ods output iterhistory=iterhistory) and look at the objective function values and gradients, it should point out what is going on.  I am guessing that over in the third or fourth decimal place you will start to see differences.  Also, take a look at the Hessian matrix, and there should be small differences there as well.

One way of getting the parameter values closer to one another might be to use the PCONV= option in the PROC GLIMMIX statement.  You would probably also have to override the other convergence criteria as well, making them all stricter in an NLOPTIONS statement, and increase the maximum number of iterations.

Steve Denham.

Kinga
Calcite | Level 5

Thanks for the answer; indeed the decimal places were different.

As far as I understood if I want to make them closer I should change the convergence criteria, but as I see it is not possible to define the criteria based on the change in the likelihood value. Am I right or is there any way to set convergence criteria for the change in the likelihood value (such as if it is less than 1e-6, then stop)?

Thanks again.

SteveDenham
Jade | Level 19

I am not absolutely sure but I think the objective function convergence is controlled with the FCONV or FCONV2 option in an NLOPTIONS statement.  Now if convergence is reached on one of the other criterion (GCONV and PCONV) then changing the FCONV level would not have an effect.

Did some quick checking, and perhaps the ABSFCONV would be in order.

Steve Denham

Message was edited by: Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1675 views
  • 3 likes
  • 2 in conversation