Solved: can one use proc glimmix to estimate idiosyncratic random error varian...

gregking · Posted 02-16-2021 08:19 PM

i am trying to use mixed-effect logistic regression (using proc glimmix) to estimate variance of non-subject-level random error term, together with some subject-level random effects. trying to use something like:

proc glimmix data=testdata;

class somegroup;

model y=x1 x2 / dist=binomial link=logit solution;

random intercept / residual type=vc;

** or: ;

** random _residual_;

** getting some number;

random intercept / subject=somegroup;
run;

proc glimmix returns estimate for the random residual, often time some number close to 1. I am not sure if this estimated number really represent or can be interpreted as the variance of the random error term. Can someone please help? Thanks!

SteveDenham · Posted 02-18-2021 08:52 AM

I have some issues with what is supposed to be going on. First, the code for the data to be generated has the variable 'L', but it is not referenced or given a value anywhere. Second, I don't know how many observations are being generated.

I made some guesses - looping your code with 1000 observations and defining L at each iteration as L=n/1000, I get a dataset with 27 1's and 973 0's. When I plug in the expected values for the random variates (0 for rannor, 0.5 for ranuni and L), I get a predicted value of 22 1's. This is with p=exp(-4.5) = 0.011 -> expected y =0.011/0.5 = 2.2%; The RANDOM intercept statement leads to a non-positive G matrix, so I commented it out.

Using an ESTIMATE statement, I calculated the expected value of y in the logit space at 0.5 (the mean value for L). It was -3.80, with 95% confidence bounds of (-4.26, -3.33), not quite including the -4.5.

So my thought is that the addition of the normal error may be affecting your ability to recover the generating parameters, but since the logit is strongly non-linear, the influence of positive values of the normal variate may be enough to move the needle (so to speak) toward a slightly larger estimate for the logit.

SteveDenham

View solution in original post

SteveDenham · Posted 02-17-2021 08:04 AM

As no subject is specified for the R side variance component, the result is a measure of overdispersion.. I think this page of the GLIMMIX documentation covers your question:

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=statug&docsetTarget=statu...

In this case, the approach lifts the constraint on the scale parameter for the binomial distribution, which is ordinarily fixed at 1. If your result is close to 1, I would think there is little evidence for overdispersion. You could check the information criteria and the chi square/df ratio to see what is going on.

SteveDenham

gregking · Posted 02-17-2021 09:43 AM

Thanks for providing the information, this is very helpful for me. The estimated phi is around 1.25, can this be interpreted as the variance of non-subject-specific random error? Or it is a parameter to indicate degree of overdispersion? Thanks!

gregking · Posted 02-17-2021 10:02 PM

the problem i need to solve is like this: first I created some binary outcome test data:

p = 1/(1+exp(6.0 - 3.0 * L + 1.0*RANNOR(111)));
if RANUNI(222) < p then
y=1;

else

y=0;

Then I try to see if I could recover the original parameters, including the variance of the zero mean error term in the exponent, by running proc glimmix:

proc glimmix data=testdata;

model y=L / dist=binomial link=logit solution;

random intercept;

run;

but the estimated parameters are all very different from those that generated the testdata. Is there anything in glimmix that can do that? or the problem itself is wrong? Thanks!

SteveDenham · Posted 02-18-2021 08:52 AM

I have some issues with what is supposed to be going on. First, the code for the data to be generated has the variable 'L', but it is not referenced or given a value anywhere. Second, I don't know how many observations are being generated.

I made some guesses - looping your code with 1000 observations and defining L at each iteration as L=n/1000, I get a dataset with 27 1's and 973 0's. When I plug in the expected values for the random variates (0 for rannor, 0.5 for ranuni and L), I get a predicted value of 22 1's. This is with p=exp(-4.5) = 0.011 -> expected y =0.011/0.5 = 2.2%; The RANDOM intercept statement leads to a non-positive G matrix, so I commented it out.

Using an ESTIMATE statement, I calculated the expected value of y in the logit space at 0.5 (the mean value for L). It was -3.80, with 95% confidence bounds of (-4.26, -3.33), not quite including the -4.5.

So my thought is that the addition of the normal error may be affecting your ability to recover the generating parameters, but since the logit is strongly non-linear, the influence of positive values of the normal variate may be enough to move the needle (so to speak) toward a slightly larger estimate for the logit.

SteveDenham

can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

Re: can one use proc glimmix to estimate idiosyncratic random error variance?

SAS Innovate 2025: Call for Content