Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
ColinGradstudent
Calcite | Level 5

Hello All,

I am looking for some advice or guidance in implementing a user defined variance function for a negative binomial error distribution in PROC GLIMMIX (preferrably) or PROC NLMIXED. Basically, I am trying to implement the alternative variance function described in a recent Ecology article (Linden & Mantyniemi. 2011. Ecology 92: 1414-1421). Their innovation was basically a reformulation of the variance function to include a second estimable overdispersion parameter:

sigma^2 = omega*mu + theta*mu^2

This is in contrast to the more conventional variance function sigma^2 = mu + k*mu^2 which is the default for dist=negbin in GLIMMIX (and NLMIXED I believe). My most successful attempts have still resulted in an error stating that omega and theta are undefined...

The authors of the paper provided some very simple MATLAB code, but it seems like they did their 'parameter estimation' by brute force, because they require lists of values for omega and theta as part of the input data. Is there a way to write out the programming statements so that SAS does the heavy lifting of estimating both of these overdispersion parameters?

Thanks in advance for any help or advice.

cheers,

Colin

3 REPLIES 3
SteveDenham
Jade | Level 19

I believe you are going to have to use NLMIXED, if you want to change the variance function. The last example in the NLMIXED documentation gives an example of how to use the general() distribution for the log likelihood.

A search on SAS-L may turn up some valuable sources as well.  Look for posts by Dale McLerran and David Cassell, in particular.

Steve Denham

ColinGradstudent
Calcite | Level 5

With the help of Adam Smith, who posted on this topic a couple years ago on SAS-L, I believe we have come up with a working implementation using NLMIXED of this extended parameterization of the Negative Binomial Distribution variance function as per Linden et al. (2011). Hopefully this proves useful for anybody else looking to use this formulation of the NB. Of course any comments or improvements are welcome... especially ways to efficiently expand this to larger models. Anyway, here are the salient bits:

/* Example dataset */

data counts;

   input ni @@;

   sub = _n_;

   do i=1 to ni;

      input x y @@;

      output;

   end;

   datalines;

  1 29 0

  6  2 0 82 5 33 0 15 2 35 0 79 0

19 81 0 18 0 85 0 99 0 20 0 26 2 29 0 91 2 37 0 39 0  9 1 33 0

     3 0 60 0 87 2 80 0 75 0  3 0 63 1

  9 18 0 64 0 80 0  0 0 58 0  7 0 81 0 22 3 50 0

15 91 0  2 1 14 0  5 2 27 1  8 1 95 0 76 0 62 0 26 2  9 0 72 1

    98 0 94 0 23 1

  2 34 0 95 0

18 48 1  5 0 47 0 44 0 27 0 88 0 27 0 68 0 84 0 86 0 44 0 90 0

    63 0 27 0 47 0 25 0 72 0 62 1

13 28 1 31 0 63 0 14 0 74 0 44 0 75 0 65 0 74 1 84 0 57 0 29 0

    41 0

  9 42 0  8 0 91 0 20 0 23 0 22 0 96 0 83 0 56 0

  3 64 0 64 1 15 0

  4  5 0 73 2 50 1 13 0

  2  0 0 41 0

20 21 0 58 0  5 0 61 1 28 0 71 0 75 1 94 16 51 4 51 2 74 0  1 1

    34 0  7 0 11 0 60 3 31 0 75 0 62 0 54 1

  2 66 1 13 0

  5 83 7 98 1 11 1 28 0 18 0

17 29 5 79 0 39 2 47 2 80 1 19 0 37 0 78 1 26 0 72 1  6 0 50 3

    50 4 97 0 37 2 51 0 45 0

17 47 0 57 0 33 0 47 0  2 0 83 0 74 0 93 0 36 0 53 0 26 0 86 0

     6 0 17 0 30 0 70 1 99 0

  7 91 0 25 1 51 4 20 0 61 1 34 0 33 2

14 60 0 87 0 94 0 29 0 41 0 78 0 50 0 37 0 15 0 39 0 22 0 82 0

    93 0  3 0

16 68 0 26 1 19 0 60 1 93 3 65 0 16 0 79 0 14 0  3 1 90 0 28 3

    82 0 34 0 30 0 81 0

19 48 3 48 1 43 2 54 0 45 9 53 0 14 0 92 5 21 1 20 0 73 0 99 0

    66 0 86 2 63 0 10 0 92 14 44 1 74 0

  8 34 1 44 0 62 0 21 0  7 0 17 0  0 2 49 0

13 11 0 27 2 16 1 12 3 52 1 55 0  2 6 89 5 31 5 28 3 51 5 54 13

    64 0

  9  3 0 36 0 57 0 77 0 41 0 39 0 55 0 57 0 88 1

  7  2 0 80 0 41 1 20 0  2 0 27 0 40 0

18 73 1 66 0 10 0 42 0 22 0 59 9 68 0 34 1 96 0 30 0 13 0 35 0

    51 2 47 0 60 1 55 4 83 3 38 0

17 96 0 40 0 34 0 59 0 12 1 47 0 93 0 50 0 39 0 97 0 19 0 54 0

    11 0 29 0 70 2 87 0 47 0

13 59 0 96 0 47 1 64 0 18 0 30 0 37 0 36 1 69 0 78 1 47 1 86 0

    88 0

15 66 0 45 1 96 1 17 0 91 0  4 0 22 0  5 2 47 0 38 0 80 0  7 1

    38 1 33 0 52 0

12 84 6 60 1 33 1 92 0 38 0  6 0 43 3 13 2 18 0 51 0 50 4 68 0

;

proc sort; by sub i; run;

proc print; run;

title 'Linden parameterization';

proc nlmixed data=counts;

*omega = 1;  *constrains model to quadratic mean-variance relationship, NB2;

             *omega starting value in parms statement should be greater than 1 when estimating

              Quasi-Poisson model to avoid execution error due to zero denominator;

*theta = 0;  *constrains model to linear mean-variance relationship, i.e., NB1, quasiPoisson;

*Starting fixed effect parameter estimates are the rounded coefficient estimates from the NB model;

*Starting value for omega should be > 1 (or at least not 1) to prevent dividing by 0 in calculation of r;

parms b_0=-1 b_1=0 omega=4 theta=1;

eta_lambda = b_0 + b_1*x + u;  /* subject level random intercept */

lambda = exp(eta_lambda);

r = lambda / (omega - 1 + theta*lambda);

p = r / (r + lambda);

loglike = lgamma(y+r) - lgamma(y+1) - lgamma(r) + r*log(p) + y*log(1-p);

/* fitting the model */

model y ~ general(loglike);

random u ~ normal(0, exp(2*log_sdSUB)) subject=sub;

run;

/* Alternatively, replacing r, p, and loglike with the following gives the equivalent hierarchical gamma-poisson formulation */

alpha = linp / (omega - 1 + theta*linp);

beta = 1 / (omega - 1 + theta*linp);

loglike = lgamma(y+alpha) - lgamma(y+1) - lgamma(alpha) + alpha*log(beta/(1+beta)) + y*log(1/(1+beta));

/* Compare against GLIMMIX results */

TITLE 'Negative binomial';

proc glimmix data=counts method=quad;

    class sub;

    model y = x / link = log s dist=nb;

    random int / subject=sub;

run;

SteveDenham
Jade | Level 19

This is why the listserv is such a valuable resource.  This looks like what I wouldn't have been able to come up with myself, but I knew existed, at least close in form.

Steve Denham

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1573 views
  • 3 likes
  • 2 in conversation