BookmarkSubscribeRSS Feed
GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Hi Experts,

What is the way to calculate BLUP (best linear unbiased prediction) and the 95% confidence interval in probability scale (not logit) using GLIMMIX (or other PROC)?  My application is to obtain the cluster level (random effect) predictions of the binary outcome.   Fixed effects are also included in the model.  I understand I can get Solution for Random Effects but I have difficulties to convert this linear-scale (logit) to the original scale, which is more interpretable within a caterpillar plot.  The code I used is followed.  Looking forward to hearing from you.   Thanks.

 

PROC glimmix data=&datain ;

   class &catvar randvar ;

   model yvar  = &catvar &numvar / s dist=bin link=logit ;

   random intercept/ subject=randvar solution cl ;

   ods output SolutionR =_rand  ParameterEstimates=_fix ;

run;

24 REPLIES 24
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Any individual BLUP can be obtained with an ESTIMATE statement. For instance, the third level of your fixed effect and second level of your (random) subject:

estimate 'f3s2' int 1 &catvar 0 0 1 | int 1 / subject 0 1 cl ilink;

If you just wanted the random portion of this (same as the Solution table for the random effect):

estimate 's2'  | int 1 / subject 0 1 cl ilink;

 

Note that the inverse link is obtained. The limits of the confidence interval are obtained by using the inverse link function on the confidence limits of the logits.

 

You can ]have any number of estimate statements, but can get tedious with many statements. There are ways of writing macros to do some of the tedious stuff (but you have to know a bit about macro programming). See:

http://support.sas.com/kb/37/109.html

 

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Thanks Ivm, this is great!

 

I was comparing the result from ESTIMATE statement to the predicted values from OUTPUT OUT statement. The scenario is to obtain the shrunk prediction using a simple model without fixed effect.  From my experience the results are not consistent and I was wondering if this is a valid comparison.  The code is like this:

 

* 1 Using ESTIMATE statement;

PROC glimmix data=&datain ;

   class randvar ;

   model yvar  = / s dist=bin link=logit ;

   random intercept/ subject=randvar solution cl ;

   ESTIMATE ‘subject1’ | int 1 / subject 1 cl ilink;

run;

 

*2  Getting average of predicted value;

PROC glimmix data=&datain ;

   class randvar ;

   model yvar  = / s dist=bin link=logit ;

   random intercept/ subject=randvar solution cl ;

   output out=_pred_out pred (blup ilink)=PredProb ;

run;

proc means data=_pred_out;

 class  randvar;

 var  PredProb;

run;

SteveDenham
Jade | Level 19

If the underlying distribution for the binomial is such that the mean probability is something other than 0.5, the second method will lead to a biased result, probably due to regression toward the mean (quoting, probably incorrectly, from Walt Stroup here).  The "mean" of several probablilies is not well estimated by a simple average.  Hence, the use of the "overall BLUP" (for want of a better term) that is obtained from the ESTIMATE statement.  Things are done in the "linked" space, where values are more meaningfully averaged, and then put back into the original probability scale.

 

Steve Denham

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Thanks Steve. I mentioned “the scenario is to obtain the shrunk prediction using a simple model without fixed effect”; the predicted value is the same for all observations within each cluster (subject) under such a model. It was just for checking purpose using OUTPUT OUT. I confused everyone.     I agree on all you said, much appreciated.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

My code is to give you individual BLUPS for any specified subject. You are averaging all the BLUPs, which does not have any inherent meaning. In particular, the standard error (or standard deviation) of the estimated BLUPs across all subjects is not comparable to standard errors for invidual subjects. As Steve mentioned, the data scale is different for the analysis.By the way, your ESTIMATE statement is only giving you the additive u (deviation)blup  prediction, not the predictions in your output file (the latter is giving you fixed intercept + random subject effect, all on the inverse link scale). If you use:

ESTIMATE ‘subject1’ int 1 | int 1 / subject 1 cl ilink;

output out=_pred_out pred(blup ilink)=PredProb stderr(ilink blup)=stderr ;

your will get the exact same value in the estimate table as you get in the output file for subject 1. (Note that there is a "int 1" before the | and an "int 1" after the |.

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Thanks Ivm again for helping me.

 

ESTIMATE ‘subject1’ int 1 | int 1 / subject 1 cl ilink   works really well for a model without any fixed effect. I am trying to work out why both mean u and upper u appear to be 1.0 for my data when the fixed effects are added.   And let me know if I made a mistake again.

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Hi Ivm,

 

Sorry I thought I got the solotion.

 

I still need help for the use of ESTIMATE ‘subject1’ int 1 | int 1 / subject 1 cl ilink when fixed effects (especially numerical) are added. I thought adding the fixed effects can improve predictions and confidence interval should be narrower. But my results were that the confidence interval is much wider than a model without any fixed effect. The predictions could also move towards one direction (can explain more for this later). Could you shed more light and confirm the use of the above ESTIMATE statement? 

 

Thanks,

George

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
The estimate statement I gave you is appropriate only when you don't have any fixed effects (other than the intercept). Also, if you want subject 3, the last part would be "subject 0 0 1;", and so on. I don't know what your fixed effect terms are.

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Fair enough Ivm. Let’s be more specific.

 

A dummy dataset is attached. The variable school is the cluster (subject) level field representing the random effect. Student is the lower level variable. Final_pass is a binary outcome variable. Other are fixed effect: school_type and entry_score. For your reference my program is also attached.

 

The result for school A (as an example) is quite different when the entry_score is added as a fixed effect. As you indicated in your last message, the ESTIMATE statement may need a change when a fixed effect is included. Let’s use only entry_score as the fixed effect and see how the estimates added up.  Thanks for your continued help.

 

Model

Label

Estimate (logit)

Mean u

Std err

Lower u

Upper u

Without fixed effect

School A

1.4105

0.8038

0.1092

0.4973

0.9444

Entry_score as fixed effect

School A

0.3145

0.578

0.2299

0.1746

0.8986

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Attaching the data again using non-SAS dataset format.  Had problem of attaching SAS dataset.

JacobSimonsen
Barite | Level 11

I am in doubt whether it is possible to get BLUP in this model. As far as I know, it is not possible to get any unbiased estimator of Odds ratios in a fixed effect model. And since the predictions of the random effects should be adjusted for the estimated Odds ratios of the fixed effects it should also not be possible to give unbiased predictions of random effects.

In practice, however, maksimum likelihood estimates, as GLIMMIX calculates, are good if just the sample size are not extreme small.

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

Thanks   for your comments, appreciated.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I respectfully disagree with Jacob. Getting empirical BLUPs is straightforward with GLMMs. All covered in Stroup's book "Generalized Linear Mixed Models" (2013). There can be a small bias in parameter estimates, but not enough to prevent the calculation.

 

Syntax will depend on the exact form of your model and random statements. If you have one covariable (X), then the first "int 1" portion of your estimate statement would be changed to:

"int 1 X 5.5"

if you wanted to get the BLUP for X=5.5.

 

GZ
Fluorite | Level 6 GZ
Fluorite | Level 6

It is great to see the clarification.

 

Ivm you are the closest so far towards the final solution.  Hopefully it is not too far.  I really appreciate your continued effort.

 

As a correction I have marked the post as unsolved based on the original question.  Sorry for any confusion.

 

If anyone else there has experience in obtaining BLUP and CI with mixed model (fixed + random), please help.  Thanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 24 replies
  • 5437 views
  • 15 likes
  • 4 in conversation