BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pdiff
Obsidian | Level 7

I have run into an interesting situation with a negative binomial likelihood in GLIMMIX where I get different inferential results in version 9.3 (TS1M0) and later 9.3 releases or version 9.4.  The estimates and likelihood values are the same, but the std errs, test statistics and p-values are not (much smaller in later versions).  See code below, output attached. I'm not concerned with correctness of the model (unless that is related to the problem), but rather what is the difference between versions.  Also, removing the repeated measures random statement eliminates the inconsistency.  Anyone have some ideas on this or seen such behavior before?

Thanks,

Bill Price

University of Idaho

Code---------------------------------

Data worms;

input site$ AEZ year biomass density;

cards;

TROY.1      1     2011  34.06 86.74

PULL.1      1     2011  51.30 123.92

FERD.1      2     2011  75.11 136.31

FERD.2      2     2011  113.68      111.52

FAIR.1      1     2011  116.65      111.52

ASTN.1      2     2011  204.38      247.83

COLF.1      2     2011  209.47      161.09

MAUP.1      2     2011  344.20      259.38

PEND.3N     3     2011  669.95      299.52

MALD.1      2     2011  766.91      346.96

GENE.1      1     2011  849.58      458.49

MOSC.1      1     2012  0.40  2

MAUP.1      2     2012  8.22  14.15

FERD.2      2     2012  8.25  28.29

COLF.3      2     2012  8.57  40.74

TROY.1      1     2012  16.75 40.17

STJN.1      2     2012  24.65 40.74

FAIR.1      1     2012  30.38 61.09

COLF.2      2     2012  32.39 132.40

PULL.1      1     2012  37.82 58.45

LAPW.1      2     2012  44.91 84.89

FERD.1      2     2012  48.26 81.49

COLF.1      2     2012  57.89 99.03

MILT.1      3     2012  85.45 28.29

GENE.1      1     2012  86.26 122.23

JULI.1      1     2012  86.27 142.61

MALD.1      2     2012  93.56 244.47

ASTN.1      2     2012  96.69 290.03

UNTN.1      1     2012  154.34      295.40

PEND.2      3     2012  159.24      155.62

PEND.3N     3     2012  210.34      240.51

KEND.1      1     2012  .     8.10

LAPW.1      2     2013  1.4   7.570018412

JULI.1      1     2013  5.9   9.786910814

KEND.1      1     2013  6.5   9.417730007

TROY.1      1     2013  9.2   8.721076006

PULL.1      1     2013  11.9  11.05275258

MAUP.1      2     2013  13.2  60.48822313

GENE.1      1     2013  15.4  20.33988973

MOSC.1      1     2013  19.2  120.7715404

STJN.1      2     2013  21.5  34.88430403

PEND.3N     3     2013  29.5  50.93108388

FERD.2      2     2013  31.3  91.95890144

FERD.1      2     2013  38.2  30.24411157

MILT.1      3     2013  40.5  137.0167514

PEND.2      3     2013  44.2  46.66867796

ASTN.1      2     2013  48.4  75.23055226

COLF.1      2     2013  49.5  68.88785015

FAIR.1      1     2013  55.0  70.51996229

COLF.5      2     2013  56.2  99.68717485

MALD.1      2     2013  65.0  190.3118793

COLF.2      2     2013  82.5  131.8482201

UNTN.1      1     2013  125.1 109.6977191

;

Proc GLIMMIX data=worms method=laplace;

Class site AEZ year;

Model density = AEZ year AEZ*year /Dist =negbin;

Random intercept /subject=site(AEZ) ;

random year/type=ar(1) subject=site(AEZ);

LSmeans AEZ /PDiff ilink CL;

title1 'Density Analysis AR(1) (neg bin)';

run;

1 ACCEPTED SOLUTION

Accepted Solutions
blar
Calcite | Level 5

Disclaimer: not a statistician; went to school for numerical analysis and haven't actively practiced rigorous mathematics in a couple of years.

My guess is that the source of the difference isn't a change in the algorithm, or even the version of SAS, but a combination of a change from a 32-bit SAS (your 9.3 results show an environment of W32_7PRO) to a 64-bit one (9.4 results show X64_S08R2) and some precision hijinks caused by the zero-variance component noted above.

To do a little bit of confirmation, I ran the code above (slightly modified to output the g matrix and a detailed iteration history of GLIMMIX) on a single machine with both 32-bit 9.3 TS1M2 and 64-bit 9.3 TS1M2 running on the same Win7 workstation. Same thing: results don't match, and they don't match what was posted above, either.

The scary part, to me, is that I see statisticians advocate for leaving zero-variance components in a model, as 'the model takes care of it' - see http://support.sas.com/resources/papers/proceedings12/332-2012.pdf, page 10 for an example. I don't know if I can agree with that sentiment based on what I've seen here. But again, not a statistician. I just know that if you fill a matrix with a number of zero or near-zero entries and then try to work with it from there, you're going to run into issues.

View solution in original post

12 REPLIES 12
Reeza
Super User

I'd be talking to tech support...and posting the answer back here Smiley Happy

pdiff
Obsidian | Level 7

Yeah, it's on my to-do list today, just thought I'd throw it out to the group-think here too Smiley Happy .

Bill

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Have you learned anything from SAS technical support? I confirmed on my machine that you do get different results with the two versions. Interestingly, my two different results,are both different from your two results. I don' t know your experimental design, but I do think your model may be overparameterized. With a negative binomial distribution, you get a scale parameter for the lowest level in the hierarchy. If this is site(AEZ), then your model is definitely over parameterized. The random variance would be confounded with the NB scale. Of course, since I don't know anything about your design, this is (educated) speculation. based on looking at your data. This still should not produce different results on different versions, but it can result in certain covariances going to 0, making other things blow up.

pdiff
Obsidian | Level 7

lvm,

     Thank you for your reply.  Yes, I would agree that the model is probably fubar, but also agree that even a wrong answer should be consistently wrong.  This is what troubles me about the problem.  If, in fact, you clean up the model to say a split plot in time, without the repeated business, you get more consistency across platforms.  My main rationale here is to figure out why this is happening so I can avoid it, or at least be aware of it, for future analyses.

As for the design, this is from a landscape survey where sites were chosen at random from Agro-Ecological Zones (AEZ). Unfortunately for the researchers, it was difficult to get consistency of sites across years due to agricultural practices on the sites.  I suspect this unbalanced nature through time is the root of the problem.

Thanks again for taking the time to look at this.  I have contacted SAS support and they told it will take some time to dig into the problem.  It is apparently not obvious for them either.  I will report their assessment back here when I get it.

SteveDenham
Jade | Level 19

Since I don't do things in R, this feels like heresy to me, but I am going to suggest it:  Try coding this up in one of the R packages (glmer I think) and see what results you get.  And if that looks odd, post to the R-mixed-models group about it--there are some very sharp people there who can spot what is going on pretty fast.

Steve Denham

pdiff
Obsidian | Level 7

Yes, I had considered that initially, although I am hampered by my lack of skill in R.  In looking that way, however, I did note right away potential problems for this particular issue.  For one, W. Stroup in his R examples for "count, split plot" in his Agronomy Journal article (https://www.agronomy.org/publications/aj/articles/0/0/agronj2013.0342) makes several comments suggesting that things will be different than SAS, no matter what data/model is used, such as: "####Note df does not match SAS" and "###NOTE glmmADMB interacts with the lsmeans package, that is, lsmeans will not work for lmer or glmer models after glmmADMB has been installed."  I am not sure why he chose to use the glmmADMB package for the negbin (there is a glmer.nb routine in lme4).  Also, the documentation for lme4 notes: "lme4 does not currently implement nlme’s features for modeling heteroscedasticity and correlation of residuals."  At that point I decided it was easier to seek a SAS solution/comparison given my limited abilities and time to figure things out there Smiley Happy.  That said, if anyone here has the bi-lingual skills necessary and is willing to take a look, I'd welcome it. I just know it's probably best not done by me Smiley Happy .

Bill

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I doubt if R coding will help with your problem. It is unlikely that you will exactly duplicate the model in R that you are trying to fit in GLIMMIX. Stroup has made it very clear that there are lot of GLMMs that still cannot be fitted in R. It is likely that the model you are trying to fit is not appropriate, based on previous comments, so that either results would not be useful. You are trying to figure out why SAS gives different answers with different versions.

SteveDenham
Jade | Level 19

Bill,

I agree completely--my hope was that someone on the R-mixed-models mailing list might have seen something similar and we could work from that.  I'll back Larry--I was making a type 3 error of asking the wrong question.  Why 9.3 and 9.4 give different results (and really only for dist=nb) is a matter for Tech Support.

Steve Denham

pdiff
Obsidian | Level 7

I heard back and, ... well, pretty much a non answer other than what we already would know.  The data can not support the model well causing precision issues with the G matrix.  I was more interested in knowing what changed in the negbin algorithm(s) to give such drastic differences. But, enough is enough for me, I guess Smiley Happy.  Caution is the word in all modeling.

As they used to say in Hill Street Blues, "Be careful out there."

------------------- Response below  ----------------------------

For this analysis, you are specifying covariance structure through 2 RANDOM statements. This may not be a suitable model for the data.

At convergence, the gradient for the AR(1) variance in much larger than 0, indicating a problem.

The estimated value of the AR(1) variance is very close to zero, leading to a NPD G matrix

The components in the hessian matrix are of very different scales, and the covariance matrix (-inv(hessian)) has zeros at the diagonal for cov parms AR(1) and Scale.

                        Covariance Parameter Estimates

                                                                               Standard

                 Cov Parm     Subject      Estimate       Error

                 Intercept    site(AEZ)      0.1249        0.3547

                 Variance    site(AEZ)     8.51E-18      0.7887

                 AR(1)          site(AEZ)     0.1313            .

                 Scale                               0.5465           .

We tend to caution users to remember if the G-matrix is NPD and the standard errors of covariance parameters are missing tends to be an indication of either over specified or incorrectly specified model. It is possible if G-matrix is NPD ; or incorrectly specified model, or if there is a numerical, precision issue that different results might occur with the same data set and model. This type of behavior does not happen often but it can happen.

I hope the above information is helpful to you.

---------------------  end response  ---------------------------

blar
Calcite | Level 5

Disclaimer: not a statistician; went to school for numerical analysis and haven't actively practiced rigorous mathematics in a couple of years.

My guess is that the source of the difference isn't a change in the algorithm, or even the version of SAS, but a combination of a change from a 32-bit SAS (your 9.3 results show an environment of W32_7PRO) to a 64-bit one (9.4 results show X64_S08R2) and some precision hijinks caused by the zero-variance component noted above.

To do a little bit of confirmation, I ran the code above (slightly modified to output the g matrix and a detailed iteration history of GLIMMIX) on a single machine with both 32-bit 9.3 TS1M2 and 64-bit 9.3 TS1M2 running on the same Win7 workstation. Same thing: results don't match, and they don't match what was posted above, either.

The scary part, to me, is that I see statisticians advocate for leaving zero-variance components in a model, as 'the model takes care of it' - see http://support.sas.com/resources/papers/proceedings12/332-2012.pdf, page 10 for an example. I don't know if I can agree with that sentiment based on what I've seen here. But again, not a statistician. I just know that if you fill a matrix with a number of zero or near-zero entries and then try to work with it from there, you're going to run into issues.

pdiff
Obsidian | Level 7

The more I think on this, that's an explanation that makes sense and an obvious one I had not considered. My understanding on zero variance components is to use an option such as nobound to allow them to become negative. This gains some advantages in type I error rates and can be interpreted in terms of intra-class correlations.  Thanks for your thoughts on this. .... and the reminder that I need to bug IT about updating all our SAS venues! Smiley Happy

Bill

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

SAS developers apparently were not interested in determining why you got different results with different versions because all the versions were giving meaningless results, based on the chosen model and data. Machine precision can make a difference when things blow up. I know from SAS developers that 32 bit and 64 bit machines could give different (incorrect) answers when there are numerical problems with a given data set. I think you have an example of this. Results should always be the same when things are "working correctly".

The issue of G not being positive definite is a tricky one. It does not mean that things blew up or that one cannot use the results. The safe answer is to change the model whenever you get the warning that this happens, but I think this is a bit of an overreaction based on experience. SAS puts in the warning in the Log to cover themselves, because theory does break down when G is NPD. However, in some cases it does not matter, especially with variance component models (which is not the model that the OP is using). For instance, I have explored this a lot with variance component models when one of the components is 0. I have compared the numerical results (test statistics, expected values, SEs, etc., etc.) for data sets when the 0 variance term was kept in the model and when it was removed. In typical experimental design scenarios, the results are identical. I won't get into the always controversial issue of negative variances (this would be a different model).

For more complex random effects, it is trickier to evaluate. You should always think about your results when G is NPD.  If one is getting missing values for estimated SEs of variance-covariance parameters, then you should definitely consider another model. But things can work well numerically with positive semidefinte G.  For instance, MIXED/GLIMMIX gives a nice way to restrict an (almost) unstructed G to be at least positive semidefinite. This is done with type=CHOL or type=FA0(#) option. This can be numerically more stable than type=UN. You are not dealing with this situation, but I just bring it up to give some background.

I support the statements in that 2012 Global Forum paper that was cited. All zeros in G are not equally bad. It depends.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 3990 views
  • 6 likes
  • 5 in conversation