BookmarkSubscribeRSS Feed
Damien_Mather
Lapis Lazuli | Level 10

Has anybody noticed anything like this happening before?

'before' specification was:

proc glimmix

data =perm.yogurt3

   (

     where=

      (

        PermMissAlts in (78,15,47,79,37,14,101) and

        brand2 in (1,2,3,4,5,6,7,8,9)

       )

   )

   method=laplace

   nobound

;

  nloptions technique=dbldog maxfunc=4E3 maxiter=2E3;

  class householdid PermMissAlts;

  model chosen =  a1 a2 a3 a4 a5 a6 price feature / noint ddfm=none dist=negbin link=log solution;

  random a1 a2 a3 a4 a5 a6 price feature  / type=chol;

  random householdid PermMissAlts / type=vc;

run;


-2 Log Likelihood91517.00
AIC (smaller is better)91611.00
AICC (smaller is better)91611.09
BIC (smaller is better)91517.00
CAIC (smaller is better)91564.00
HQIC (smaller is better)91517.00

-2 log L(Chosen | r. effects)90383.47
Pearson Chi-Square49367.18
Pearson Chi-Square / DF0.96

fixed effect estimates:

a1

4.25900.9252Infty4.60<.0001
a23.46091.3053Infty2.650.0080
a32.38261.5971Infty1.490.1357
a41.41751.8434Infty0.770.4419
a52.35522.0610Infty1.140.2531
a63.10322.2580Infty1.370.1693
price-6.78182.4405Infty-2.780.0055
feature0.067002.6047Infty0.030.9795

followed by

proc glimmix,

same data,

same (apparent) model specification,

same (apparent) estimation methods and tolerances,

just add a couple of (apparent) output option to the random statement, (/ GCHOL /G)

i.e. 'after' looks like:

proc glimmix

  data =perm.yogurt3

   (

    where=

     (

      PermMissAlts in (78,15,47,79,37,14,101) and

      brand2 in (1,2,3,4,5,6,7,8,9)

     )

   )

  method=laplace

  nobound

;

  nloptions technique=dbldog maxfunc=4E3 maxiter=2E3;

  class householdid PermMissAlts;

  model choice =  a1 a2 a3 a4 a5 a6 price feature / noint ddfm=none dist=negbin link=log solution;

  random a1 a2 a3 a4 a5 a6 price feature / type=chol G GC;

  random householdid PermMissAlts / type=vc;

run;

and I get

vastly different estimates (most fixed effect shrank by a binary order of magnitude)

a11.35060.06449Infty20.94<.0001
a20.81150.06447Infty12.59<.0001
a30.13290.05951Infty2.230.0255
a4-0.27500.05952Infty-4.62<.0001
a50.19140.07046Infty2.720.0066
a60.14740.08645Infty1.710.0882
price-3.93300.1151Infty-34.17<.0001
feature0.31030.03701Infty8.39<.0001

greatly improved model fit

-2 Log Likelihood55164.03
AIC (smaller is better)55258.03
AICC (smaller is better)55258.12
BIC (smaller is better)55164.03
CAIC (smaller is better)55211.03
HQIC (smaller is better)55164.03

-2 log L(choice | r. effects)55108.87
Pearson Chi-Square29640.88
Pearson Chi-Square / DF0.58

negative binomial scale factor estimate drops from 4 to 1.6

McCullagh and Nelder scale factor drops from nearly unity to 0.58

and a strange mixture of happiness and confusion!

Could this be due to the /GCHOL or /G output option invoking a different estimation option (such a an empirical or sandwich estimator)?

Any guesses, hunches, alternatives, confirmations welcome.

4 REPLIES 4
SteveDenham
Jade | Level 19

Wow.  By everything I have done with this procedure, I would NEVER expect this to happen.  Because the change in the magnitude of the fixed effect solution values is almost identical to the change in the log likelihood, the first thing I would suspect is that, somehow, there was a change in the data, especially in the number of records used.  However unlikely, make sure that this was not the cause.  Just asking for the G matrix and Cholesky root to be output shouldn't affect the estimation.  Curiosity aroused indeed!

What, if any, changes occurred in the covariance parameter estimates?

Steve Denham

Damien_Mather
Lapis Lazuli | Level 10

Thanks Steve.

Nope, absolutely sure the data did not change. For a start, nothing was done to perm.yogurt3 between the proc glimmix steps.

Secondly, step1 reported:

Number of Observations Read53027
Number of Observations Used51441


Step2 reported:

Number of Observations Read53027
Number of Observations Used51441

.

We would generally consider the first set of covariance estimates a disappointment:

CHOL(1,1)0.9208.
CHOL(2,1)0.9208.
CHOL(2,2)0.9208.
CHOL(3,1)0.9208.
CHOL(3,2)0.9208.
CHOL(3,3)0.9208.
CHOL(4,1)0.9208.
CHOL(4,2)0.9208.
CHOL(4,3)0.9208.
CHOL(4,4)0.9208.
CHOL(5,1)0.9208.
CHOL(5,2)0.9208.
CHOL(5,3)0.9208.
CHOL(5,4)0.9208.
CHOL(5,5)0.9208.
CHOL(6,1)0.9208.
CHOL(6,2)0.9208.
CHOL(6,3)0.9208.
CHOL(6,4)0.9208.
CHOL(6,5)0.9208.
CHOL(6,6)0.9208.
CHOL(7,1)0.9208.
CHOL(7,2)0.9208.
CHOL(7,3)0.9208.
CHOL(7,4)0.9208.
CHOL(7,5)0.9208.
CHOL(7,6)0.9208.
CHOL(7,7)0.9208.
CHOL(8,1)0.9208.
CHOL(8,2)0.9208.
CHOL(8,3)0.9208.
CHOL(8,4)0.9208.
CHOL(8,5)0.9208.
CHOL(8,6)0.9208.
CHOL(8,7)0.9208.
CHOL(8,8)0.9208.
HouseHoldID0.16900.01440
PermMissAlts0.93670.6027
Scale4.22980.06749

Whereas the second step set are a lot more interesting (and credible!), and the transformations into the G matrix and thence into the GCORR matrix yield quite useful and actionable inferences for brand managers:
CHOL(1,1)-0.00559.
CHOL(2,1)-0.00493.
CHOL(2,2)-0.00481.
CHOL(3,1)-0.00568.
CHOL(3,2)-0.00525.
CHOL(3,3)-0.00950.
CHOL(4,1)-0.00170.
CHOL(4,2)-0.00172.
CHOL(4,3)-0.00136.
CHOL(4,4)-0.00307.
CHOL(5,1)-0.00523.
CHOL(5,2)-0.00481.
CHOL(5,3)-0.00552.
CHOL(5,4)-0.00160.
CHOL(5,5)-0.00525.
CHOL(6,1)-0.00639.
CHOL(6,2)-0.00528.
CHOL(6,3)-0.00478.
CHOL(6,4)-0.00097.
CHOL(6,5)-0.00590.
CHOL(6,6)-0.01573.
CHOL(7,1)0.007528.
CHOL(7,2)0.006518.
CHOL(7,3)0.006129.
CHOL(7,4)-0.00018.
CHOL(7,5)0.007475.
CHOL(7,6)0.009907.
CHOL(7,7)-0.02040.
CHOL(8,1)0.001548.
CHOL(8,2)0.001323.
CHOL(8,3)0.002397.
CHOL(8,4)-0.00020.
CHOL(8,5)0.001542.
CHOL(8,6)0.002046.
CHOL(8,7)-0.00379.
CHOL(8,8)-0.00168.
HouseHoldID9.676E-90.006443
PermMissAlts0.020200.005175
Scale1.6093.

I was thinking that it may be that the first step stalled at a stationary point NOT the global optimum, despite both steps specifying the same optimizing algorithm (double dogleg). I noted that the double dogleg uses the gradient to update an approximate hessian matrix. I wonder if one or more of the (apparent) RANDOM statement output options somehow influenced the gradient calculation (maybe just in terms of accuracy), thereby influencing both the approximate hessian update and the endpont?

Thoughts?

SteveDenham
Jade | Level 19

That's all I can figure would be happening--because the covariance matrix in the first example is one where the optimization has "gotten stuck", probably at a local minimum.  Is the problem "repeatable" in the sense of still exisiting after changing the order of terms in the model, or in terms of sorting the data?

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

It is very unusual to define a G matrix based on continuous variables and no subject. This would not be your usual random-coefficient model. Your model is likely overparameterized with that many random effects. Without a subject= specification for the a's, the fixed and random terms are competing to explain the same variability. These comments apply to either version of your code. One of the warning signs is that you are getting a missing (.) for the estimated SE for your estimated variance-covariance parameters. What warnings or notes are you getting in your Log? This does not preclude using these parameters, but I would worry about this.

It is also true that your second model is identical to the first, and you are simply asking for additional output. This should not affect your model fit, obviously since the G option only kicks in when deciding what to display. I am sure the problems go back to my initial comments.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1584 views
  • 0 likes
  • 3 in conversation