I am not sure what else you are looking for. One needs to see your full model code to give any more specific advice. You have the general syntax to use. Based on what you showed, we can't tell if you are doing things appropriately. By the way, chapters 6 and 8 in SAS for Mixed Models, 2nd edition (Littell et al. 2006) gives plenty of examples and explanations of this.
I was looking at your table for school A. If you used the code:
estimate 's2' | int 1 / subject 0 1 cl ilink;
for the situation with fixed effects (you give a categorical and a continuous predictor in your code), then you are getting the EBLUP for the last level of the categorical variable (always a coefficient of 0) and for the continuous predictor with a value of 0 (which may be impossible). You are not getting any kind of 'global mean'. If you want the EBLUP for the second level of the categorical predictor (C) and when the continuous predictor (say X) has a value of 3.5, then you use:
estimate 'example' int 1 X 3.5 C 0 1 | int 1 / subject 1 cl ilink;
In general, you will get different EBLUPS with and without fixed effect predictors, and the SEs will be smaller with the fixed effects.
As title indicates I am interested in predictions for the random variable only but the fixed effects are included to adjust for the difference among subjects. A ‘global mean’ is sought for each subject and it is often adjusted due to the difference in characteristics or makeup of each subject. My specific request is to preserve the predictions (not the distance to average) and keep in the scale of proportion (not logit) for easier interpretation.
I hope I have made myself clear for what I am look for. As we know prediction of random effect has been widely used for many years in a number of areas, e.g. tree or animal breeding and hospital or school performance measurement. I thought there is an easier way.
ESTIMATE statement is great for obtaining an estimate with a specific set of effects but I don’t fully know if it can estimate random effect as described. Thanks for the guidance and I hope I have a solution.
For my learning, I tried the code regarding “If you used the code:
estimate 's2' | int 1 / subject 0 1 cl ilink;
for the situation with fixed effects (you give a categorical and a continuous predictor in your code), then you are getting the EBLUP for the last level of the categorical variable (always a coefficient of 0) and for the continuous predictor with a value of 0”
I wonder if you mean
estimate 's2' int 1 | int 1 / subject 0 1 cl ilink;
is equivalent to:
estimate 's2' int 1 school_type 0 1 entry_score 0 | int 1 / subject 0 1 cl ilink;
If I run the following code using the dataset I sent, different results show. I was also removing 'int 1' before vertical bar (this is in your statement) but I could not get the same results from the two estimate statements.
PROC glimmix data=blup.school NOCLPRINT MAXLMMUPDATE=100 ;
class school school_type;
model final_pass = entry_score school_type / s cl dist=bin link=logit ;
random intercept/ subject=school solution cl ;
estimate 's2: statement 1' int 1 | int 1 / subject 0 1 cl ilink;
estimate 's2: statement 2' int 1 school_type 0 1 entry_score 0 | int 1 / subject 0 1 cl ilink;
run;
Label | Estimate | Standard Error |
s2: statement 1 | 0.01563 | 0.8291 |
s2: statement 2 | -0.7348 | 0.8021 |
Estimate statements are tricky, and these two do not mean the same thing. I will try to send you a more detailed response soon.
The statement
estimate 's2: 1' int 1 | int 1 / subject 0 1 cl ilink e ;
only is of direct use if there are no fixed effects in the model (only the intercept). The statement can be used with factors or covariates, but the meaning is different in each case. In fact, this statement is likely not giving you what you want with fixed effects. By the way, it is useful to add an E option to these statements to see what GLIMMIX is doing, because the procedure fills in the blanks when the fixed effects are not explicitly given. For instance, with a continuous variable, the procedure is using 0 for X; this is meaningless for you, since the smallest value of your covariable is 36. I know you want to get a correction for fixed effects, but if there are fixed effects, then your EBLUPs really should be for specific values of the fixed effects. See my comments below on different variations of your possible model.
This estimate statement makes sense here without fixed effects. You get a logit of 0.564 and SE of 0.532).
PROC glimmix data=school NOCLPRINT MAXLMMUPDATE=100 ;
title 'no fixed effects';
class school school_type;
model final_pass = / s cl dist=bin link=logit ;
random intercept/ subject=school solution cl ;
estimate 's2: 1' int 1 | int 1 / subject 0 1 cl ilink e ;
run;
For all the runs from now on, I give the simple version of the BLUP statement (allowing for defaults in GLIMMIX), listed as "1", and then I give the exact same statement where the missing terms are explicitly given (listed as "1b"). That is the first two estimate statements give the exact same answer for each run. As you can see, the first estimate statement really is doing the calculation for entry_score=0 (an impossible value for this data set). You average value of the covariable is about 70, so I added an estimate statement with this value. You can see that one is now getting a similar EBLUP as found with no fixed effects. The SE is different.
PROC glimmix data=school NOCLPRINT MAXLMMUPDATE=100 ;
title 'only continuous fixed';
class school school_type;
model final_pass = entry_score / s cl dist=bin link=logit ;
random intercept/ subject=school solution cl ;
estimate 's2: 1' int 1 | int 1 / subject 0 1 cl ilink e ;
estimate 's2: 1b' int 1 entry_score 0 | int 1 / subject 0 1 cl ilink e;
estimate 's2: X=70' int 1 entry_score 70 | int 1 / subject 0 1 cl ilink e;
run;
Things are trickier with a factor (class variable). Your data set is not balanced -- there are about twice as many public as private schools. Note that here your first estimate statement really is giving the EBLUP for the LSMEAN of the two types of schools (1b) (this is not the marginal means because of the imbalance). You can kind of recover an EBLUP similar to the no-fixed-effect run by using the 0.16 and 0.84 coefficients for school type (1c estimate) (not same ratio as actual data because this model fitting is not for a linear model -- there is skewness that affects results).
I actually think you should be looking at EBLUPs for each school type, not trying to get a global shool type. I give those estimate statements.
PROC glimmix data=school NOCLPRINT MAXLMMUPDATE=100 ;
title 'only categorical fixed';
class school school_type;
model final_pass = school_type / s cl dist=bin link=logit ;
random intercept / subject=school solution cl ;
estimate 's2: 1' int 1 | int 1 / subject 0 1 cl ilink e ; *identical to next one;
estimate 's2: 1b' int 1 school_type 0.5 0.5 | int 1 / subject 0 1 cl ilink e ;
estimate 's2: 1c' int 1 school_type 0.16 0.84 | int 1 / subject 0 1 cl ilink e ;
*^because of imbalance in numbers in two school types, coefficients are not equal above;
estimate 's2: sch type 1' int 1 school_type 1 0 | int 1 / subject 0 1 cl ilink e;
estimate 's2: sch type 2' int 1 school_type 0 1 | int 1 / subject 0 1 cl ilink e;
run;
Thinks get even trickier with continuous and factor both in the model. As mentioned above, there is imbalance in observations in the two school types. Plus, the distribution of the continuous variable is not the same for the two school types. Thus, it is very hard to define a global type of central fixed effect. As you can see below, if you use the first estimate statement, you are getting the EBLUP for X=0 and the LSMEAN of the two school types (not the marginal means) (compare 1 and 1b results). Not a useful result. By trial and error, one can get a kind of central-fixed-effect EBLUP with the third estimate statement (1d). This is a bit strange because it is for X=36 (about the minimum observed X), and for the LSMEANS of the categorical variable. It is likely due to the combination of imbalance and different distributions for X. This is giving you about the same as the first run with no fixed effects. I would prefer if you got EBLUPs for each school type at the mean of the X (about 70). See these estimate statements.
PROC glimmix data=school NOCLPRINT MAXLMMUPDATE=100 ;
title 'categorical and continuous fixed';
class school school_type;
model final_pass = entry_score school_type / s cl dist=bin link=logit ;
random intercept/ subject=school solution cl ;
estimate 's2: 1' int 1 | int 1 / subject 0 1 cl ilink e ;
estimate 's2: 1b' int 1 school_type 0.5 0.5 entry_score 0 | int 1 / subject 0 1 cl ilink e;
*^X=0 is meaningless, but that is what you get with first estimate statement;
estimate 's2: X=36 (1d)' int 1 school_type 0.5 0.5 entry_score 36 | int 1 / subject 0 1 cl ilink e;
estimate 's2: sch typ 1 X=70' int 1 school_type 1 0 entry_score 70 | int 1 / subject 0 1 cl ilink e ;
estimate 's2: sch typ 2 X=70' int 1 school_type 0 1 entry_score 70 | int 1 / subject 0 1 cl ilink e ;
run;
Overall, as I stated in other posts, I don't think there is a good way of doing what you originally wanted. That is, I feel you need to be explicit for the fixed effects if they are in the model.
Here are my big thanks Ivm for your step-by-step demonstration on using ESTIMATE statement, very much appreciated. I learnt a lot.
In reality there are more fixed effects with unbalanced data so it is not easy to explicitly and objectively specify the central points for all that needed. This might be an improvement area for future release or there might be some existing options we haven't found?
I doubt if this is something for a future release, since it is not part of model fitting of post-model-fitting testing or estimation. This will have to be a judgement call of the researcher.
BYLEVEL option is already available to LSMEANS statement (allows separate margins be used for each level of the fixed effect); its addition to ESTIMATE statement should be possible in my view.
Here is a link for the option if anyone is interested in more details:
Go to the following link if anyone is interested further: LSMEAN for random variables in mixed model
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.