When we specify mixed models we have fixed effects and random effects. The general equation for the model is Y=X*beta + Z*gamma + error. The betas are the fixed parameters and the random parameters. Now Z has to be a subset of X, namely the columns of Z are a subset of the columns of X. Also since the gamma are random they are distributed as N(0,G) where G is the covariance matrix of the gammas.
The interpretation of the beta parameters are regarded as population parameters and the gamma parameters are regarded as subject specific parameters. So for example lets say we were analyzing the effect of two treatments, a and b, and we randomly choose 10 clinics to select subjects for the study. Since the clinics are a random sample of all the clinics in the population we can regard the clinics as random effects. More to the point a simple model would look like:
So now each ith clinic adds their own intercept and time parameter to the model.
The conditional model is E(Y|gamma_i)=beta0+beta1*time+beta_2*trt+gamma0_i + gamma1_i*time. This model takes into account each clinic's involvement hence we are conditioning the Y given the gamma_i.
On the other hand the marginal model is E(Y)=beta0+beta1*time+beta2*trt. The random effect go away since in the marginal model we are averaging over all random effects and as was stated earlier the random effects have mean zero.
The residuals are the same as always. The predicted value Y - actual Y. But depending of whether you use the conditional or marginal model to predict Y we will be getting different values of the residuals.
Hopefully this helps. Let me know if I can clear anything up
I agree with most of your description here regarding marginal and conditional models. However, it is not correct to state that "Z has to be a subset of X". There is no such requirement.
The X design matrix is constructed according to the variables named on the MODEL statement in PROC MIXED (expanded according to formatted values if a variable also appears on the CLASS statement). The Z design matrix is constructed according to the variables named on the RANDOM statement (also expanded according to formatted values if the variable is named on the CLASS statement). It is usually the case that variables named on the RANDOM statement are also named on the MODEL statement, but it is not a necessity.
It should be noted that effects for variables named on the RANDOM statement are assumed normally distributed with mean zero. If the variable is named on the MODEL statement, then the random effects are assumed normally distributed with mean as determined by the fixed effect estimate.
Thanks for the correction Dale. I was thinking of the two stage random effects formulation where first we formulate Y_i=Z_i*beta_i + error_i and then model the beta_i according to some population parameters beta_i=A_i*beta + gamma_i.
Putting these two models together we have Y_i = (Z_i*A_i)*beta + Z_i*gamma_i + error_i
If we let X_i=Z_i*A_i then we are back to the mixed model formula with ONE exception. As quoted in Fitmaurice's Applied Longitudinal Analysis p203
"The two stage formulation requires that the design matrix for the fixed effects has the special structure X_i=Z_i*A_i where A_i contains only between subject (or time-invariant) covariates and Z_i contains only within-subject (or time-varying) covariates. This form of the design matrix for the fixed effects implies that any time-varying covariates must be specified as random effects to ensure their inclusion in the model for the population mean response."
More importantly it goes on to say "This constraint is unnecessary and, in many settings, it can be somewhat inconvenient."
Dale - as far as the initial question, do you think my reasoning as to which residuals to use is correct?
Message was edited by: trekvana