Hi Experts,
What is the way to calculate BLUP (best linear unbiased prediction) and the 95% confidence interval in probability scale (not logit) using GLIMMIX (or other PROC)? My application is to obtain the cluster level (random effect) predictions of the binary outcome. Fixed effects are also included in the model. I understand I can get Solution for Random Effects but I have difficulties to convert this linear-scale (logit) to the original scale, which is more interpretable within a caterpillar plot. The code I used is followed. Looking forward to hearing from you. Thanks.
PROC glimmix data=&datain ;
class &catvar randvar ;
model yvar = &catvar &numvar / s dist=bin link=logit ;
random intercept/ subject=randvar solution cl ;
ods output SolutionR =_rand ParameterEstimates=_fix ;
run;
Any individual BLUP can be obtained with an ESTIMATE statement. For instance, the third level of your fixed effect and second level of your (random) subject:
estimate 'f3s2' int 1 &catvar 0 0 1 | int 1 / subject 0 1 cl ilink;
If you just wanted the random portion of this (same as the Solution table for the random effect):
estimate 's2' | int 1 / subject 0 1 cl ilink;
Note that the inverse link is obtained. The limits of the confidence interval are obtained by using the inverse link function on the confidence limits of the logits.
You can ]have any number of estimate statements, but can get tedious with many statements. There are ways of writing macros to do some of the tedious stuff (but you have to know a bit about macro programming). See:
http://support.sas.com/kb/37/109.html
Thanks Ivm, this is great!
I was comparing the result from ESTIMATE statement to the predicted values from OUTPUT OUT statement. The scenario is to obtain the shrunk prediction using a simple model without fixed effect. From my experience the results are not consistent and I was wondering if this is a valid comparison. The code is like this:
* 1 Using ESTIMATE statement;
PROC glimmix data=&datain ;
class randvar ;
model yvar = / s dist=bin link=logit ;
random intercept/ subject=randvar solution cl ;
ESTIMATE ‘subject1’ | int 1 / subject 1 cl ilink;
run;
*2 Getting average of predicted value;
PROC glimmix data=&datain ;
class randvar ;
model yvar = / s dist=bin link=logit ;
random intercept/ subject=randvar solution cl ;
output out=_pred_out pred (blup ilink)=PredProb ;
run;
proc means data=_pred_out;
class randvar;
var PredProb;
run;
If the underlying distribution for the binomial is such that the mean probability is something other than 0.5, the second method will lead to a biased result, probably due to regression toward the mean (quoting, probably incorrectly, from Walt Stroup here). The "mean" of several probablilies is not well estimated by a simple average. Hence, the use of the "overall BLUP" (for want of a better term) that is obtained from the ESTIMATE statement. Things are done in the "linked" space, where values are more meaningfully averaged, and then put back into the original probability scale.
Steve Denham
Thanks Steve. I mentioned “the scenario is to obtain the shrunk prediction using a simple model without fixed effect”; the predicted value is the same for all observations within each cluster (subject) under such a model. It was just for checking purpose using OUTPUT OUT. I confused everyone. I agree on all you said, much appreciated.
My code is to give you individual BLUPS for any specified subject. You are averaging all the BLUPs, which does not have any inherent meaning. In particular, the standard error (or standard deviation) of the estimated BLUPs across all subjects is not comparable to standard errors for invidual subjects. As Steve mentioned, the data scale is different for the analysis.By the way, your ESTIMATE statement is only giving you the additive u (deviation)blup prediction, not the predictions in your output file (the latter is giving you fixed intercept + random subject effect, all on the inverse link scale). If you use:
ESTIMATE ‘subject1’ int 1 | int 1 / subject 1 cl ilink;
output out=_pred_out pred(blup ilink)=PredProb stderr(ilink blup)=stderr ;
your will get the exact same value in the estimate table as you get in the output file for subject 1. (Note that there is a "int 1" before the | and an "int 1" after the |.
Thanks Ivm again for helping me.
ESTIMATE ‘subject1’ int 1 | int 1 / subject 1 cl ilink works really well for a model without any fixed effect. I am trying to work out why both mean u and upper u appear to be 1.0 for my data when the fixed effects are added. And let me know if I made a mistake again.
Hi Ivm,
Sorry I thought I got the solotion.
I still need help for the use of ESTIMATE ‘subject1’ int 1 | int 1 / subject 1 cl ilink when fixed effects (especially numerical) are added. I thought adding the fixed effects can improve predictions and confidence interval should be narrower. But my results were that the confidence interval is much wider than a model without any fixed effect. The predictions could also move towards one direction (can explain more for this later). Could you shed more light and confirm the use of the above ESTIMATE statement?
Thanks,
George
Fair enough Ivm. Let’s be more specific.
A dummy dataset is attached. The variable school is the cluster (subject) level field representing the random effect. Student is the lower level variable. Final_pass is a binary outcome variable. Other are fixed effect: school_type and entry_score. For your reference my program is also attached.
The result for school A (as an example) is quite different when the entry_score is added as a fixed effect. As you indicated in your last message, the ESTIMATE statement may need a change when a fixed effect is included. Let’s use only entry_score as the fixed effect and see how the estimates added up. Thanks for your continued help.
Model | Label | Estimate (logit) | Mean u | Std err | Lower u | Upper u |
Without fixed effect | School A | 1.4105 | 0.8038 | 0.1092 | 0.4973 | 0.9444 |
Entry_score as fixed effect | School A | 0.3145 | 0.578 | 0.2299 | 0.1746 | 0.8986 |
Attaching the data again using non-SAS dataset format. Had problem of attaching SAS dataset.
I am in doubt whether it is possible to get BLUP in this model. As far as I know, it is not possible to get any unbiased estimator of Odds ratios in a fixed effect model. And since the predictions of the random effects should be adjusted for the estimated Odds ratios of the fixed effects it should also not be possible to give unbiased predictions of random effects.
In practice, however, maksimum likelihood estimates, as GLIMMIX calculates, are good if just the sample size are not extreme small.
Thanks JacobSimonsen for your comments, appreciated.
I respectfully disagree with Jacob. Getting empirical BLUPs is straightforward with GLMMs. All covered in Stroup's book "Generalized Linear Mixed Models" (2013). There can be a small bias in parameter estimates, but not enough to prevent the calculation.
Syntax will depend on the exact form of your model and random statements. If you have one covariable (X), then the first "int 1" portion of your estimate statement would be changed to:
"int 1 X 5.5"
if you wanted to get the BLUP for X=5.5.
It is great to see the clarification.
Ivm you are the closest so far towards the final solution. Hopefully it is not too far. I really appreciate your continued effort.
As a correction I have marked the post as unsolved based on the original question. Sorry for any confusion.
If anyone else there has experience in obtaining BLUP and CI with mixed model (fixed + random), please help. Thanks.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.