Good afternoon,
I am running hierarchical linear models and the book I am reading (Multilevel Models. Applications using SAS) by Wang, Xie, and Fisher recommends the following test for treating independent variables as having random or fixed effects.
proc mixed method = reml covtest;
class site;
model depvar = indvar1 indvar2 indvar3 indvar 4 indvar5 site_indvar5
/ solution DDFM=BW NOTEST; weight pweight;
RANDOM int indvar1 indvar2 indvar3 indvar4 indvar5/ subject = site G TYPE = VC;
run;
Then check which variables have a significant Pr> Z in the Covariance Parameter Estimates in the output. The authors state something in passing that makes me believe I should run these tests without control variables and use those results, but I am not 100 percent certain.
My pause is coming from the full model I have run.
proc mixed method = ml covtest ic;
class site;
model depvar= indvar1 indvar2 indvar3 indvar4 indvar5 site_indvar5 age gender education/solution ddfm=bw notest; weight newweight;
random int indvar1 indvar3 indvar5 /subject = region G TYPE = VC;
run;
Indvar1 indvar3, and indvar5 were significant in the test for randomness. However, this full model also produces covariance paremeter estimates. In this one, indvar1 does not have a significant Pr > Z. If I no longer treat it as random, the results of the model substantially change.
Is there a straightforward way to interpret this? Or are there additional tests I can run?
I would never let the choice of whether a term is 'random' or 'fixed' depend solely on the results of a hypothesis test, let alone the Wald test that the covtest option uses in PROC MIXED. First, consider study design and the inferences you wish to make. Second, consider whether you have an adequate number of levels of a random variable to be able to estimate the covariance parameters associated - if this number is low (and I mean less than 10 or 20 levels), then unless the design factors override, you should consider the variable as fixed. And if the factor is a repeated measure, you may wish to consider it as both fixed and random - where the random part is a structured deviation over all of the fixed part.
Now, given that, if you have questions as to whether a random effect should be included, seriously consider moving to PROC GLIMMIX where the COVTEST statement allows for several hypotheses of interest, and approaches them through the use of likelihood ratio tests.
SteveDenham
Thank you for your helpful response.
By levels, I usually think of how the data are nested. My model is two-level if it is nested in counties. Three-level, if it is nested in counties and also in states. I presume you mean something else by this?
In terms of design, I am reading a number of controversies. I've seen statisticians argue for letting everything be estimated as random, and those who argue to limit this as much as possible for model stability. I've seen others point to how the slope and intercept between X and Y varies across sites (counties, states, etc.). I've also seen it proposed that we emphasize whether or not the independent variable mean and variance of coefficients vary across sites.
Do you have any suggestions or recommended readings for what to emphasize theoretically and in study design? For dependent variables, I am looking at groups which have several theoretically rich connections to the locations they are clustered in
We are dealing with a nomenclature problem. I understand your definition of level; what I was referring to was how many separate "instances" or "clusters" you have for each of your random effects.
So on to interpretation. Once you add in the effects of your control variables, indvar1's variance component is no longer "significant". I don't want to sound rude, but so what if it isn't "significant'? Many times we keep "nonsignificant" terms in fixed effects - for example, we would keep a main effect in the model if the interaction of that effect and another was "significant". It comes down to consideration of the design and your interpretation of the variable.
SteveDenham
Thank you for your quick response. Yes, this is what I thought you meant as that conceptually makes sense. I just wanted make sure as I am still learning and may miss a lot. I have over 100 sites in this instance.
Your second point makes sense. I do not know why I had considered that earlier. I am beginning to realize the model of just going based on the statistical test was a bit misguided. Thank you for your time.
Your post brought this paper to mind. You might find it interesting.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.