BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gregking
Calcite | Level 5

i am trying to use mixed-effect logistic regression (using proc glimmix) to estimate variance of non-subject-level random error term, together with some subject-level random effects. trying to use something like:

 

proc glimmix data=testdata;

    class  somegroup;

    model y=x1 x2 / dist=binomial link=logit  solution;


   random intercept / residual type=vc;

 

   ** or:   ;

   ** random  _residual_;

   **  getting some number;


   random intercept / subject=somegroup;
run;

 

proc glimmix returns estimate for the random residual, often time some number close to 1. I am not sure if this estimated number really represent or can be interpreted as the variance of the random error term. Can someone please help?   Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

I have some issues with what is supposed to be going on.  First, the code for the data to be generated has the variable 'L', but it is not referenced or given a value anywhere.  Second, I don't know how many observations are being generated.

 

I made some guesses - looping your code with 1000 observations and defining L at each iteration as L=n/1000, I get a dataset with 27 1's and 973 0's.  When I plug in the expected values for the random variates (0 for rannor, 0.5 for ranuni and L), I get a predicted value of 22 1's.   This is with p=exp(-4.5) = 0.011 -> expected y =0.011/0.5 = 2.2%; The RANDOM intercept statement leads to a non-positive G matrix, so I commented it out.

 

Using an ESTIMATE statement, I calculated the expected value of y in the logit space at 0.5 (the mean value for L).  It was -3.80, with 95% confidence bounds of (-4.26, -3.33), not quite including the -4.5.

 

So my thought is that the addition of the normal error may be affecting your ability to recover the generating parameters, but since the logit is strongly non-linear, the influence of positive values of the normal variate may be enough to move the needle (so to speak) toward a slightly larger estimate for the logit.

 

SteveDenham

 

View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

As no subject is specified for the R side variance component, the result is a measure of overdispersion..  I think this page of the GLIMMIX documentation covers your question:

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=statug&docsetTarget=statu... 

In this case, the approach lifts the constraint on the scale parameter for the binomial distribution, which is ordinarily fixed at 1.  If your result is close to 1, I would think there is little evidence for overdispersion.  You could check the information criteria and the chi square/df ratio to see what is going on.

 

SteveDenham

gregking
Calcite | Level 5
Thanks for providing the information, this is very helpful for me. The estimated phi is around 1.25, can this be interpreted as the variance of non-subject-specific random error? Or it is a parameter to indicate degree of overdispersion? Thanks!
gregking
Calcite | Level 5

the problem i need to solve is like this: first I created some binary outcome test data:

p = 1/(1+exp(6.0 - 3.0 * L + 1.0*RANNOR(111)));
if RANUNI(222) < p then
     y=1;

else

      y=0;

 

Then I try to see if I could recover the original parameters, including the variance of the zero mean error term in the exponent, by running proc glimmix:

 

proc glimmix data=testdata;

      model y=L / dist=binomial link=logit  solution;

      random intercept;

run;

 

but the estimated parameters are all very different from those that generated the testdata. Is there anything in glimmix that can do that?  or the problem itself is wrong?  Thanks!

 

SteveDenham
Jade | Level 19

I have some issues with what is supposed to be going on.  First, the code for the data to be generated has the variable 'L', but it is not referenced or given a value anywhere.  Second, I don't know how many observations are being generated.

 

I made some guesses - looping your code with 1000 observations and defining L at each iteration as L=n/1000, I get a dataset with 27 1's and 973 0's.  When I plug in the expected values for the random variates (0 for rannor, 0.5 for ranuni and L), I get a predicted value of 22 1's.   This is with p=exp(-4.5) = 0.011 -> expected y =0.011/0.5 = 2.2%; The RANDOM intercept statement leads to a non-positive G matrix, so I commented it out.

 

Using an ESTIMATE statement, I calculated the expected value of y in the logit space at 0.5 (the mean value for L).  It was -3.80, with 95% confidence bounds of (-4.26, -3.33), not quite including the -4.5.

 

So my thought is that the addition of the normal error may be affecting your ability to recover the generating parameters, but since the logit is strongly non-linear, the influence of positive values of the normal variate may be enough to move the needle (so to speak) toward a slightly larger estimate for the logit.

 

SteveDenham

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 397 views
  • 0 likes
  • 2 in conversation