i am trying to use mixed-effect logistic regression (using proc glimmix) to estimate variance of non-subject-level random error term, together with some subject-level random effects. trying to use something like:
proc glimmix data=testdata;
class somegroup;
model y=x1 x2 / dist=binomial link=logit solution;
random intercept / residual type=vc;
** or: ;
** random _residual_;
** getting some number;
random intercept / subject=somegroup;
run;
proc glimmix returns estimate for the random residual, often time some number close to 1. I am not sure if this estimated number really represent or can be interpreted as the variance of the random error term. Can someone please help? Thanks!
I have some issues with what is supposed to be going on. First, the code for the data to be generated has the variable 'L', but it is not referenced or given a value anywhere. Second, I don't know how many observations are being generated.
I made some guesses - looping your code with 1000 observations and defining L at each iteration as L=n/1000, I get a dataset with 27 1's and 973 0's. When I plug in the expected values for the random variates (0 for rannor, 0.5 for ranuni and L), I get a predicted value of 22 1's. This is with p=exp(-4.5) = 0.011 -> expected y =0.011/0.5 = 2.2%; The RANDOM intercept statement leads to a non-positive G matrix, so I commented it out.
Using an ESTIMATE statement, I calculated the expected value of y in the logit space at 0.5 (the mean value for L). It was -3.80, with 95% confidence bounds of (-4.26, -3.33), not quite including the -4.5.
So my thought is that the addition of the normal error may be affecting your ability to recover the generating parameters, but since the logit is strongly non-linear, the influence of positive values of the normal variate may be enough to move the needle (so to speak) toward a slightly larger estimate for the logit.
SteveDenham
As no subject is specified for the R side variance component, the result is a measure of overdispersion.. I think this page of the GLIMMIX documentation covers your question:
In this case, the approach lifts the constraint on the scale parameter for the binomial distribution, which is ordinarily fixed at 1. If your result is close to 1, I would think there is little evidence for overdispersion. You could check the information criteria and the chi square/df ratio to see what is going on.
SteveDenham
the problem i need to solve is like this: first I created some binary outcome test data:
p = 1/(1+exp(6.0 - 3.0 * L + 1.0*RANNOR(111)));
if RANUNI(222) < p then
y=1;
else
y=0;
Then I try to see if I could recover the original parameters, including the variance of the zero mean error term in the exponent, by running proc glimmix:
proc glimmix data=testdata;
model y=L / dist=binomial link=logit solution;
random intercept;
run;
but the estimated parameters are all very different from those that generated the testdata. Is there anything in glimmix that can do that? or the problem itself is wrong? Thanks!
I have some issues with what is supposed to be going on. First, the code for the data to be generated has the variable 'L', but it is not referenced or given a value anywhere. Second, I don't know how many observations are being generated.
I made some guesses - looping your code with 1000 observations and defining L at each iteration as L=n/1000, I get a dataset with 27 1's and 973 0's. When I plug in the expected values for the random variates (0 for rannor, 0.5 for ranuni and L), I get a predicted value of 22 1's. This is with p=exp(-4.5) = 0.011 -> expected y =0.011/0.5 = 2.2%; The RANDOM intercept statement leads to a non-positive G matrix, so I commented it out.
Using an ESTIMATE statement, I calculated the expected value of y in the logit space at 0.5 (the mean value for L). It was -3.80, with 95% confidence bounds of (-4.26, -3.33), not quite including the -4.5.
So my thought is that the addition of the normal error may be affecting your ability to recover the generating parameters, but since the logit is strongly non-linear, the influence of positive values of the normal variate may be enough to move the needle (so to speak) toward a slightly larger estimate for the logit.
SteveDenham
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.