Re: PROC GLIMMIX strange (but good) behaviour

Damien_Mather · Posted 10-09-2013 10:03 PM

Has anybody noticed anything like this happening before?

'before' specification was:

proc glimmix

data =perm.yogurt3

(

where=

(

PermMissAlts in (78,15,47,79,37,14,101) and

brand2 in (1,2,3,4,5,6,7,8,9)

)

method=laplace

nobound

;

nloptions technique=dbldog maxfunc=4E3 maxiter=2E3;

class householdid PermMissAlts;

model chosen = a1 a2 a3 a4 a5 a6 price feature / noint ddfm=none dist=negbin link=log solution;

random a1 a2 a3 a4 a5 a6 price feature / type=chol;

random householdid PermMissAlts / type=vc;

run;

-2 Log Likelihood	91517.00
AIC (smaller is better)	91611.00
AICC (smaller is better)	91611.09
BIC (smaller is better)	91517.00
CAIC (smaller is better)	91564.00
HQIC (smaller is better)	91517.00

-2 log L(Chosen \| r. effects)	90383.47
Pearson Chi-Square	49367.18
Pearson Chi-Square / DF	0.96

fixed effect estimates:

a1	4.2590	0.9252	Infty	4.60	<.0001
a2	3.4609	1.3053	Infty	2.65	0.0080
a3	2.3826	1.5971	Infty	1.49	0.1357
a4	1.4175	1.8434	Infty	0.77	0.4419
a5	2.3552	2.0610	Infty	1.14	0.2531
a6	3.1032	2.2580	Infty	1.37	0.1693
price	-6.7818	2.4405	Infty	-2.78	0.0055
feature	0.06700	2.6047	Infty	0.03	0.9795

followed by

proc glimmix,

same data,

same (apparent) model specification,

same (apparent) estimation methods and tolerances,

just add a couple of (apparent) output option to the random statement, (/ GCHOL /G)

i.e. 'after' looks like:

proc glimmix

data =perm.yogurt3

(

where=

(

PermMissAlts in (78,15,47,79,37,14,101) and

brand2 in (1,2,3,4,5,6,7,8,9)

)

method=laplace

nobound

;

nloptions technique=dbldog maxfunc=4E3 maxiter=2E3;

class householdid PermMissAlts;

model choice = a1 a2 a3 a4 a5 a6 price feature / noint ddfm=none dist=negbin link=log solution;

random a1 a2 a3 a4 a5 a6 price feature / type=chol G GC;

random householdid PermMissAlts / type=vc;

run;

and I get

vastly different estimates (most fixed effect shrank by a binary order of magnitude)

a1	1.3506	0.06449	Infty	20.94	<.0001
a2	0.8115	0.06447	Infty	12.59	<.0001
a3	0.1329	0.05951	Infty	2.23	0.0255
a4	-0.2750	0.05952	Infty	-4.62	<.0001
a5	0.1914	0.07046	Infty	2.72	0.0066
a6	0.1474	0.08645	Infty	1.71	0.0882
price	-3.9330	0.1151	Infty	-34.17	<.0001
feature	0.3103	0.03701	Infty	8.39	<.0001

greatly improved model fit

-2 Log Likelihood	55164.03
AIC (smaller is better)	55258.03
AICC (smaller is better)	55258.12
BIC (smaller is better)	55164.03
CAIC (smaller is better)	55211.03
HQIC (smaller is better)	55164.03

-2 log L(choice \| r. effects)	55108.87
Pearson Chi-Square	29640.88
Pearson Chi-Square / DF	0.58

negative binomial scale factor estimate drops from 4 to 1.6

McCullagh and Nelder scale factor drops from nearly unity to 0.58

and a strange mixture of happiness and confusion!

Could this be due to the /GCHOL or /G output option invoking a different estimation option (such a an empirical or sandwich estimator)?

Any guesses, hunches, alternatives, confirmations welcome.

SteveDenham · Posted 10-10-2013 08:54 AM

Wow. By everything I have done with this procedure, I would NEVER expect this to happen. Because the change in the magnitude of the fixed effect solution values is almost identical to the change in the log likelihood, the first thing I would suspect is that, somehow, there was a change in the data, especially in the number of records used. However unlikely, make sure that this was not the cause. Just asking for the G matrix and Cholesky root to be output shouldn't affect the estimation. Curiosity aroused indeed!

What, if any, changes occurred in the covariance parameter estimates?

Steve Denham

Damien_Mather · Posted 10-10-2013 07:04 PM

Thanks Steve.

Nope, absolutely sure the data did not change. For a start, nothing was done to perm.yogurt3 between the proc glimmix steps.

Secondly, step1 reported:

Number of Observations Read	53027
Number of Observations Used	51441

Step2 reported:

Number of Observations Read	53027
Number of Observations Used	51441

.

We would generally consider the first set of covariance estimates a disappointment:

CHOL(1,1)	0.9208	.
CHOL(2,1)	0.9208	.
CHOL(2,2)	0.9208	.
CHOL(3,1)	0.9208	.
CHOL(3,2)	0.9208	.
CHOL(3,3)	0.9208	.
CHOL(4,1)	0.9208	.
CHOL(4,2)	0.9208	.
CHOL(4,3)	0.9208	.
CHOL(4,4)	0.9208	.
CHOL(5,1)	0.9208	.
CHOL(5,2)	0.9208	.
CHOL(5,3)	0.9208	.
CHOL(5,4)	0.9208	.
CHOL(5,5)	0.9208	.
CHOL(6,1)	0.9208	.
CHOL(6,2)	0.9208	.
CHOL(6,3)	0.9208	.
CHOL(6,4)	0.9208	.
CHOL(6,5)	0.9208	.
CHOL(6,6)	0.9208	.
CHOL(7,1)	0.9208	.
CHOL(7,2)	0.9208	.
CHOL(7,3)	0.9208	.
CHOL(7,4)	0.9208	.
CHOL(7,5)	0.9208	.
CHOL(7,6)	0.9208	.
CHOL(7,7)	0.9208	.
CHOL(8,1)	0.9208	.
CHOL(8,2)	0.9208	.
CHOL(8,3)	0.9208	.
CHOL(8,4)	0.9208	.
CHOL(8,5)	0.9208	.
CHOL(8,6)	0.9208	.
CHOL(8,7)	0.9208	.
CHOL(8,8)	0.9208	.
HouseHoldID	0.1690	0.01440
PermMissAlts	0.9367	0.6027
Scale	4.2298	0.06749

Whereas the second step set are a lot more interesting (and credible!), and the transformations into the G matrix and thence into the GCORR matrix yield quite useful and actionable inferences for brand managers:

CHOL(1,1)	-0.00559	.
CHOL(2,1)	-0.00493	.
CHOL(2,2)	-0.00481	.
CHOL(3,1)	-0.00568	.
CHOL(3,2)	-0.00525	.
CHOL(3,3)	-0.00950	.
CHOL(4,1)	-0.00170	.
CHOL(4,2)	-0.00172	.
CHOL(4,3)	-0.00136	.
CHOL(4,4)	-0.00307	.
CHOL(5,1)	-0.00523	.
CHOL(5,2)	-0.00481	.
CHOL(5,3)	-0.00552	.
CHOL(5,4)	-0.00160	.
CHOL(5,5)	-0.00525	.
CHOL(6,1)	-0.00639	.
CHOL(6,2)	-0.00528	.
CHOL(6,3)	-0.00478	.
CHOL(6,4)	-0.00097	.
CHOL(6,5)	-0.00590	.
CHOL(6,6)	-0.01573	.
CHOL(7,1)	0.007528	.
CHOL(7,2)	0.006518	.
CHOL(7,3)	0.006129	.
CHOL(7,4)	-0.00018	.
CHOL(7,5)	0.007475	.
CHOL(7,6)	0.009907	.
CHOL(7,7)	-0.02040	.
CHOL(8,1)	0.001548	.
CHOL(8,2)	0.001323	.
CHOL(8,3)	0.002397	.
CHOL(8,4)	-0.00020	.
CHOL(8,5)	0.001542	.
CHOL(8,6)	0.002046	.
CHOL(8,7)	-0.00379	.
CHOL(8,8)	-0.00168	.
HouseHoldID	9.676E-9	0.006443
PermMissAlts	0.02020	0.005175
Scale	1.6093	.

I was thinking that it may be that the first step stalled at a stationary point NOT the global optimum, despite both steps specifying the same optimizing algorithm (double dogleg). I noted that the double dogleg uses the gradient to update an approximate hessian matrix. I wonder if one or more of the (apparent) RANDOM statement output options somehow influenced the gradient calculation (maybe just in terms of accuracy), thereby influencing both the approximate hessian update and the endpont?

Thoughts?

SteveDenham · Posted 10-11-2013 09:12 AM

That's all I can figure would be happening--because the covariance matrix in the first example is one where the optimization has "gotten stuck", probably at a local minimum. Is the problem "repeatable" in the sense of still exisiting after changing the order of terms in the model, or in terms of sorting the data?

Steve Denham

lvm · Posted 10-11-2013 09:30 AM

It is very unusual to define a G matrix based on continuous variables and no subject. This would not be your usual random-coefficient model. Your model is likely overparameterized with that many random effects. Without a subject= specification for the a's, the fixed and random terms are competing to explain the same variability. These comments apply to either version of your code. One of the warning signs is that you are getting a missing (.) for the estimated SE for your estimated variance-covariance parameters. What warnings or notes are you getting in your Log? This does not preclude using these parameters, but I would worry about this.

It is also true that your second model is identical to the first, and you are simply asking for additional output. This should not affect your model fit, obviously since the G option only kicks in when deciding what to display. I am sure the problems go back to my initial comments.