I'm conducting a MLM analysis using proc mixed. The level 1 variables are dummy coded variables for grade comparisons (Dummy_FS=freshmen vs seniors; Dummy_SS= sophomores vs seniors; Dummy_JS = juniors versus seniors). Note that some institutions do not have freshmen, others do not have sophomores and others do not have juniors.
I requested the solutions for random effects and for some institutions I got zero estimates for some of the dummy variables. Those zero estimates correspond to those institutions that do not have information available to compute those estimates. For example, looking at the attached table, Institution 372 do not have juniors nor sophomores, hence the zero estimates in Dummy_FS and Dummy_SS.
My question is: what does the zero estimate mean? Specifically, I'd like to know if those institutions with no information in specific comparisons, are being considered in the overall estimation of the random slopes or if they are not being considered in the estimation. How should I interpret the zero estimates and the standard error of prediction for those institutions?
It would be helpful to see your code and your dataset (or if your data cannot be shared, a dataset that closely resembles yours and provides similar results).
Why are you concocting your own dummy variables for the grade factor rather than using the CLASS statement?
@MOA wrote:
Specifically, I'd like to know if those institutions with no information in specific comparisons, are being considered in the overall estimation of the random slopes or if they are not being considered in the estimation.
Random slopes of what? You haven't mentioned the model you are fitting or any other variables that are being used.
As the other respondent said, we really need to see your code.
I'm computing a MLM model with the three dummy variables as level 1 variables. I'm computing the fixed effects and random slopes for the dummy variables. Here's the code:
proc mixed data= MLM_grade method = ml covtest ;
class inst_ID ;
model dv = Dummy_FS Dummy_SS Dummy_JS / solution;
random intercept Dummy_FS Dummy_SS Dummy_JS / subject = inst_ID s;
run;
I used dummy variables instead of the class statement since I am comparing the results across different programs (Mplus, SPSS, HLM). Attached is the data.
I'll assume it is your intent to compare the mean DV across four grades, or to compare the first 3 means (freshman, sophomore, junior) to the fourth (senior).
Grade is categorical. Although it is true that ANOVA is equivalent to regression on a set of dummy variables, I don't think random slopes for dummy variables make any sense.
If you want to compare estimates of mean DV by grade across software packages, it does not matter how grade is coded; any coding system will give you the same means. If you want to compare parameter estimates, then you will need to use similar coding systems. By default, the MIXED procedure uses the last level as the reference level, but you can control that using the REF= option on the CLASS statement.
Assuming that multiple observations (students?) for each level of grade within the same inst_ID are subsamples and are not independent, I suggest this code:
proc mixed data= MLM_grade method = ml covtest ;
class inst_ID grade;
model dv = grade / solution;
random inst_id inst_id*grade / solution;
/* pair-wise comparisons among grades */
lsmeans grade / pdiff adjust=simulate(seed=123) ;
/* comparison of each grade to control */
lsmeans grade / adjust=dunnett pdiff=control('4');
run;
An equivalent syntax for the RANDOM statement is
random intercept grade / subject=inst_id solution;
Either RANDOM statement produces an estimate of variance among inst_id, an estimate of variance among groups of students assigned to the same grade within inst_id, and and estimate of variance among students within groups (residual).
This model assumes that data are missing at random, but in fact you are missing appreciably more data on freshman and sophomore grade levels than on junior and senior. The number of students is extremely unbalanced within each inst_id and within groups within inst_id. These issues may make this model inappropriate.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.