turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Question re: Consistency of GLIMMIX results betwee...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-30-2014 01:29 PM

I have run into an interesting situation with a negative binomial likelihood in GLIMMIX where I get different inferential results in version 9.3 (TS1M0) and later 9.3 releases or version 9.4. The estimates and likelihood values are the same, but the std errs, test statistics and p-values are not (much smaller in later versions). See code below, output attached. I'm not concerned with correctness of the model (unless that is related to the problem), but rather what is the difference between versions. Also, removing the repeated measures random statement eliminates the inconsistency. Anyone have some ideas on this or seen such behavior before?

Thanks,

Bill Price

University of Idaho

Code---------------------------------

Data worms;

input site$ AEZ year biomass density;

cards;

TROY.1 1 2011 34.06 86.74

PULL.1 1 2011 51.30 123.92

FERD.1 2 2011 75.11 136.31

FERD.2 2 2011 113.68 111.52

FAIR.1 1 2011 116.65 111.52

ASTN.1 2 2011 204.38 247.83

COLF.1 2 2011 209.47 161.09

MAUP.1 2 2011 344.20 259.38

PEND.3N 3 2011 669.95 299.52

MALD.1 2 2011 766.91 346.96

GENE.1 1 2011 849.58 458.49

MOSC.1 1 2012 0.40 2

MAUP.1 2 2012 8.22 14.15

FERD.2 2 2012 8.25 28.29

COLF.3 2 2012 8.57 40.74

TROY.1 1 2012 16.75 40.17

STJN.1 2 2012 24.65 40.74

FAIR.1 1 2012 30.38 61.09

COLF.2 2 2012 32.39 132.40

PULL.1 1 2012 37.82 58.45

LAPW.1 2 2012 44.91 84.89

FERD.1 2 2012 48.26 81.49

COLF.1 2 2012 57.89 99.03

MILT.1 3 2012 85.45 28.29

GENE.1 1 2012 86.26 122.23

JULI.1 1 2012 86.27 142.61

MALD.1 2 2012 93.56 244.47

ASTN.1 2 2012 96.69 290.03

UNTN.1 1 2012 154.34 295.40

PEND.2 3 2012 159.24 155.62

PEND.3N 3 2012 210.34 240.51

KEND.1 1 2012 . 8.10

LAPW.1 2 2013 1.4 7.570018412

JULI.1 1 2013 5.9 9.786910814

KEND.1 1 2013 6.5 9.417730007

TROY.1 1 2013 9.2 8.721076006

PULL.1 1 2013 11.9 11.05275258

MAUP.1 2 2013 13.2 60.48822313

GENE.1 1 2013 15.4 20.33988973

MOSC.1 1 2013 19.2 120.7715404

STJN.1 2 2013 21.5 34.88430403

PEND.3N 3 2013 29.5 50.93108388

FERD.2 2 2013 31.3 91.95890144

FERD.1 2 2013 38.2 30.24411157

MILT.1 3 2013 40.5 137.0167514

PEND.2 3 2013 44.2 46.66867796

ASTN.1 2 2013 48.4 75.23055226

COLF.1 2 2013 49.5 68.88785015

FAIR.1 1 2013 55.0 70.51996229

COLF.5 2 2013 56.2 99.68717485

MALD.1 2 2013 65.0 190.3118793

COLF.2 2 2013 82.5 131.8482201

UNTN.1 1 2013 125.1 109.6977191

;

Proc GLIMMIX data=worms method=laplace;

Class site AEZ year;

Model density = AEZ year AEZ*year /Dist =negbin;

Random intercept /subject=site(AEZ) ;

random year/type=ar(1) subject=site(AEZ);

LSmeans AEZ /PDiff ilink CL;

title1 'Density Analysis AR(1) (neg bin)';

run;

Accepted Solutions

Solution

10-08-2014
06:03 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2014 06:03 PM

Disclaimer: not a statistician; went to school for numerical analysis and haven't actively practiced rigorous mathematics in a couple of years.

My guess is that the source of the difference isn't a change in the algorithm, or even the version of SAS, but a combination of a change from a 32-bit SAS (your 9.3 results show an environment of W32_7PRO) to a 64-bit one (9.4 results show X64_S08R2) and some precision hijinks caused by the zero-variance component noted above.

To do a little bit of confirmation, I ran the code above (slightly modified to output the g matrix and a detailed iteration history of GLIMMIX) on a single machine with both 32-bit 9.3 TS1M2 and 64-bit 9.3 TS1M2 running on the same Win7 workstation. Same thing: results don't match, and they don't match what was posted above, either.

The scary part, to me, is that I see statisticians advocate for leaving zero-variance components in a model, as 'the model takes care of it' - see http://support.sas.com/resources/papers/proceedings12/332-2012.pdf, page 10 for an example. I don't know if I can agree with that sentiment based on what I've seen here. But again, not a statistician. I just know that if you fill a matrix with a number of zero or near-zero entries and then try to work with it from there, you're going to run into issues.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-30-2014 01:59 PM

I'd be talking to tech support...and posting the answer back here

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-30-2014 02:33 PM

Yeah, it's on my to-do list today, just thought I'd throw it out to the group-think here too .

Bill

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-01-2014 08:44 PM

Have you learned anything from SAS technical support? I confirmed on my machine that you do get different results with the two versions. Interestingly, my two different results,are both different from your two results. I don' t know your experimental design, but I do think your model may be overparameterized. With a negative binomial distribution, you get a scale parameter for the lowest level in the hierarchy. If this is site(AEZ), then your model is definitely over parameterized. The random variance would be confounded with the NB scale. Of course, since I don't know anything about your design, this is (educated) speculation. based on looking at your data. This still should not produce different results on different versions, but it can result in certain covariances going to 0, making other things blow up.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2014 11:44 AM

lvm,

Thank you for your reply. Yes, I would agree that the model is probably fubar, but also agree that even a wrong answer should be consistently wrong. This is what troubles me about the problem. If, in fact, you clean up the model to say a split plot in time, without the repeated business, you get more consistency across platforms. My main rationale here is to figure out why this is happening so I can avoid it, or at least be aware of it, for future analyses.

As for the design, this is from a landscape survey where sites were chosen at random from Agro-Ecological Zones (AEZ). Unfortunately for the researchers, it was difficult to get consistency of sites across years due to agricultural practices on the sites. I suspect this unbalanced nature through time is the root of the problem.

Thanks again for taking the time to look at this. I have contacted SAS support and they told it will take some time to dig into the problem. It is apparently not obvious for them either. I will report their assessment back here when I get it.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2014 02:45 PM

Since I don't do things in R, this feels like heresy to me, but I am going to suggest it: Try coding this up in one of the R packages (glmer I think) and see what results you get. And if that looks odd, post to the R-mixed-models group about it--there are some very sharp people there who can spot what is going on pretty fast.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2014 05:56 PM

Yes, I had considered that initially, although I am hampered by my lack of skill in R. In looking that way, however, I did note right away potential problems for this particular issue. For one, W. Stroup in his R examples for "count, split plot" in his Agronomy Journal article (https://www.agronomy.org/publications/aj/articles/0/0/agronj2013.0342) makes several comments suggesting that things will be different than SAS, no matter what data/model is used, such as: "*####Note df does not match SAS*" and "*###NOTE glmmADMB interacts with the lsmeans package, that is, lsmeans will not work for lmer or glmer models after glmmADMB has been installed.*" I am not sure why he chose to use the glmmADMB package for the negbin (there is a glmer.nb routine in lme4). Also, the documentation for lme4 notes: "*lme4 does not currently implement nlme’s features for modeling heteroscedasticity and correlation of residuals.*" At that point I decided it was easier to seek a SAS solution/comparison given my limited abilities and time to figure things out there . That said, if anyone here has the bi-lingual skills necessary and is willing to take a look, I'd welcome it. I just know it's probably best not done by me .

Bill

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-04-2014 04:48 PM

I doubt if R coding will help with your problem. It is unlikely that you will exactly duplicate the model in R that you are trying to fit in GLIMMIX. Stroup has made it very clear that there are lot of GLMMs that still cannot be fitted in R. It is likely that the model you are trying to fit is not appropriate, based on previous comments, so that either results would not be useful. You are trying to figure out why SAS gives different answers with different versions.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-06-2014 10:23 AM

Bill,

I agree completely--my hope was that someone on the R-mixed-models mailing list might have seen something similar and we could work from that. I'll back Larry--I was making a type 3 error of asking the wrong question. Why 9.3 and 9.4 give different results (and really only for dist=nb) is a matter for Tech Support.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-07-2014 12:50 PM

I heard back and, ... well, pretty much a non answer other than what we already would know. The data can not support the model well causing precision issues with the G matrix. I was more interested in knowing what changed in the negbin algorithm(s) to give such drastic differences. But, enough is enough for me, I guess . Caution is the word in all modeling.

As they used to say in Hill Street Blues, "Be careful out there."

------------------- Response below ----------------------------

*For this analysis, you are specifying covariance structure through 2 RANDOM statements. This may not be a suitable model for the data.*

*At convergence, the gradient for the AR(1) variance in much larger than 0, indicating a problem.*

*The estimated value of the AR(1) variance is very close to zero, leading to a NPD G matrix*

*The components in the hessian matrix are of very different scales, and the covariance matrix (-inv(hessian)) has zeros at the diagonal for cov parms AR(1) and Scale.*

* Covariance Parameter Estimates*

* Standard*

* Cov Parm Subject Estimate Error*

* Intercept site(AEZ) 0.1249 0.3547*

* Variance site(AEZ) 8.51E-18 0.7887*

* AR(1) site(AEZ) 0.1313 .*

* Scale 0.5465 .*

*We tend to caution users to remember if the G-matrix is NPD and the standard errors of covariance parameters are missing tends to be an indication of either over specified or incorrectly specified model. It is possible if G-matrix is NPD ; or incorrectly specified model, or if there is a numerical, precision issue that different results might occur with the same data set and model. This type of behavior does not happen often but it can happen.*

*I hope the above information is helpful to you.*

--------------------- end response ---------------------------

Solution

10-08-2014
06:03 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2014 06:03 PM

Disclaimer: not a statistician; went to school for numerical analysis and haven't actively practiced rigorous mathematics in a couple of years.

My guess is that the source of the difference isn't a change in the algorithm, or even the version of SAS, but a combination of a change from a 32-bit SAS (your 9.3 results show an environment of W32_7PRO) to a 64-bit one (9.4 results show X64_S08R2) and some precision hijinks caused by the zero-variance component noted above.

To do a little bit of confirmation, I ran the code above (slightly modified to output the g matrix and a detailed iteration history of GLIMMIX) on a single machine with both 32-bit 9.3 TS1M2 and 64-bit 9.3 TS1M2 running on the same Win7 workstation. Same thing: results don't match, and they don't match what was posted above, either.

The scary part, to me, is that I see statisticians advocate for leaving zero-variance components in a model, as 'the model takes care of it' - see http://support.sas.com/resources/papers/proceedings12/332-2012.pdf, page 10 for an example. I don't know if I can agree with that sentiment based on what I've seen here. But again, not a statistician. I just know that if you fill a matrix with a number of zero or near-zero entries and then try to work with it from there, you're going to run into issues.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2014 07:29 PM

The more I think on this, that's an explanation that makes sense and an obvious one I had not considered. My understanding on zero variance components is to use an option such as nobound to allow them to become negative. This gains some advantages in type I error rates and can be interpreted in terms of intra-class correlations. Thanks for your thoughts on this. .... and the reminder that I need to bug IT about updating all our SAS venues!

Bill

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2014 09:06 PM

SAS developers apparently were not interested in determining why you got different results with different versions because all the versions were giving meaningless results, based on the chosen model and data. Machine precision can make a difference when things blow up. I know from SAS developers that 32 bit and 64 bit machines could give different (incorrect) answers when there are numerical problems with a given data set. I think you have an example of this. Results should always be the same when things are "working correctly".

The issue of G not being positive definite is a tricky one. It does not mean that things blew up or that one cannot use the results. The safe answer is to change the model whenever you get the warning that this happens, but I think this is a bit of an overreaction based on experience. SAS puts in the warning in the Log to cover themselves, because theory does break down when G is NPD. However, in some cases it does not matter, especially with variance component models (which is not the model that the OP is using). For instance, I have explored this a lot with variance component models when one of the components is 0. I have compared the numerical results (test statistics, expected values, SEs, etc., etc.) for data sets when the 0 variance term was kept in the model and when it was removed. In typical experimental design scenarios, the results are identical. I won't get into the always controversial issue of negative variances (this would be a different model).

For more complex random effects, it is trickier to evaluate. You should always think about your results when G is NPD. If one is getting missing values for estimated SEs of variance-covariance parameters, then you should definitely consider another model. But things can work well numerically with positive semidefinte G. For instance, MIXED/GLIMMIX gives a nice way to restrict an (almost) unstructed G to be at least positive semidefinite. This is done with type=CHOL or type=FA0(#) option. This can be numerically more stable than type=UN. You are not dealing with this situation, but I just bring it up to give some background.

I support the statements in that 2012 Global Forum paper that was cited. All zeros in G are not equally bad. It depends.