BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CuriousMind
Obsidian | Level 7

Hi Everyone, I am new to the community and having trouble about writing the syntax for my specific data in SAS. This is my first time trying doing a mixed model analysis so I apologize if the question is not enough brain-storming. I have a 3-level nested hierarchical model: Consider the famous example of students (level 1) nested in classes (level 2) where the classes are nested in school (level 3). I have no covariate in level 1; covariates X and B in my level 2; and covariate sC, D, E, F in level 3. Controlling covariates B, C, D, E, F, I want to see the fixed as well as random effects of X on the response Y. 

 

I have added a sample data (in .xlsx). This is a cross-sectional study. For serving my analysis goal I think one of these two models will do (I will chose the better one based on AIC/BIC):  

1. PROC MIXED data=SampleData covtest noclprint method = REML;
class level2 level3;
model Y= X B C D E F ddfm = SATTERTHWAITE;
random intercept/ sub=level3;
random intercept X/ sub=level2 (level3) type=UN;
run;

 

2. PROC MIXED data=SampleData covtest noclprint method = REML;
class level2 level3;
model Y= X B C D E F ddfm = SATTERTHWAITE;
random intercept X/ sub=level3 type=UN;
random intercept X/ sub=level2 (level3) type=UN;
run;

For testing assumption I am reading macro for testing 2-level mulitlevel model However, my model is 3-level model, is there any resources on how to test the assumptions for 3-level model? And how badly are we impacted if those assumptions are not met? 

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Thank you for clarifying your data set structure. I'm more sure now that we're on the same page.

 

Focusing on the covariate X: Because there are multiple (X,Y) observations (one for each level of Level2) within each level of Level3, we (or more properly, the statistical model) are able fit a regression of Y on X for each level of Level3; as you note, this set of regressions may have appreciable variance among intercepts, variance among slopes, and covariance between intercepts and slopes. These (co)variances are derived from the multiple Level3 regressions. Consequently, although you can assess whether there are random intercepts and random slopes, I'd say that assessment is "among" levels of Level3; there is no random intercept/slope among levels of Level2 because the model is using the different levels of Level2 (within each level of Level3) to define the regressions. I hope that make sense.

 

I failed to define "Xmean" and "Bmean". Xmean is the mean of the X values over the levels of Level2 for each level of Level3--it's like moving the X values up a tier, from Level2 to Level3, as if Xmean was measured at Level3. I hope that makes sense, too. This concept is addressed in the Singer paper (SES and MEANSES) I linked in an earlier response. Although I didn't intend them as centered variables, they certainly could be, and are in the Singer paper. If you center them correctly, both should be variable (i.e., not constant zero, although the mean would be zero). Should you center? Your call. If the model includes interactions (including polynomial terms, like X*X), then centering is very useful and potentially does reduce collinearity. In a model without interactions, it's less critical, I think. Centering doesn't hurt; you just have to rescale results to un-do centering if you want results on the original scales. Should you include Xmean and Bmean? Again, your call.

 

If it was me, because there are no covariates at Level1, I would compute the mean Y over the levels of Level1 for each level of Level2 within each level of Level3 and then use the mean Y as the response in the simpler, two-level model. Nothing wrong with an easier life  🙂 You would then be able to omit the second RANDOM statement. If the number of levels of Level1 are the same for all combinations of Level2 and Level3, then the statistical tests for fixed effects will be very similar, if not identical, to those from the three-level model. If the number of levels of Level1 varies dramatically among combinations of Level2 and Level3, then I might keep the three-level model.

 

I haven't looked in any detail at the paper you found with the macro for assessing assumptions. If you adequately understand how the macro is addressing assumptions, and know what the assumptions are and how to extract what you need from the MIXED procedure, you theoretically would be able to extend the methods to a three-level model. In a sense, your statistical model is a multiple regression in a mixed model, so you have all the assumptions associated with multiple regression plus the assumptions associated with a mixed model. A busy task, but not horribly difficult.

 

Good luck!

View solution in original post

20 REPLIES 20
Rick_SAS
SAS Super FREQ

The keywords to search for are 

SAS hierarchical model "proc mixed"

 

A paper that seems to present a step-by-step analysis with explanation is Suzuki and Sheu (1999). The UCLA site also has examples from Singer's seminar on multilevel models.

CuriousMind
Obsidian | Level 7

Thank you, Rick for the references. The 2nd link only deals up to 2-level predictors. The first link, although considers a 3-level model, no random effect is considered for level 3, and thus I am not entirely sure how could I write mine. I am playing with different syntaxes and all seem to give me some output. My only concern is if I am writing it right. 

Rick_SAS
SAS Super FREQ

I think you need to write out the statistical model and post the structure/design of your data. Then someone can help you translate it into PROC MIXED or GLIMMIX. For example, if there are repeated measurements (EX:  the students took multiple tests) that will affect how you write the model.

 

A nice overview of the REPEATED and RANDOM statements is provided in Tao, Kiernan, and Gibbs (2015).  Their earlier paper (2012) also is worth reading.

 

CuriousMind
Obsidian | Level 7
Thanks, Rick, I have added a sample data. I hope somebody will be able to help me out with the code.
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

You are more likely to get responses if you provide the sample data as a CSV file. Those of us without SPSS won't easily be able to use the file you've provided.

 

CuriousMind
Obsidian | Level 7
Thanks, sld. Edited. CSV format is not supported so uploaded data n .xlsx
ballardw
Super User

If your data is in a SAS data set then instructions at https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... show how to create datastep code that you can paste into the forum using the code box icon {i} or attach as a text file. Then we can run the code and generate a data set with the same characteristics as your data.

 

Anything where we have to read Excel has a very high probabiltity of something differing as either Proc Import will make guesses that don't actually match data after being filtered through Excel or 2) we have to pick options that may not match.

CuriousMind
Obsidian | Level 7

Thanks ballardw, I tried to create a text file but it's taking me forever. I attached the data to give the reader an idea how my data look like, and I explained what the variables represent and what is my mixed model setting. All I need to know is how I should write the code here. Perhaps I can include my code and somebody can have a look to see if it's correct.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

To do justice to this analysis, you'll need to know more than just how to write the code.

 

My suggestion is to study the link that @Rick_SAS provided

 

http://www.ats.ucla.edu/stat/sas/seminars/sas_mlm/mlm_sas_seminar.htm

 

and the corresponding paper

 

https://www.ida.liu.se/~732G34/info/singer.pdf

 

until you understand the nature of this model. In particular, note the use of mean covariate values at level2 for covariates measured at level1. If you understand this paper (and the helpful UCLA website), you will be able to extend the concepts to your scenario and, even more importantly, to build the "right" model and interpret it appropriately. As you study, it might be useful and more intuitive to extract a subset of your study; for example, get rid of level1 by computing means, use one factor from level2 (now the new level1) and one factor from level3 (now the new level2).

 

In addition to all that, remember to pay attention to assumptions of linearity, normality and homogeneity of variance.

 

More in-depth resources are the texts by Snijders & Boskers (Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling) and Raudenbush and Bryk (Hierarchical Linear Models: Applications and Data Analysis Methods). This kind of model is complicated, and that's why there are whole books on the topic 🙂

 

 

CuriousMind
Obsidian | Level 7
Yes @sld, I will go over these resources. Thank you 🙂
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Great! I'll look forward to seeing what you come up with.

 

CuriousMind
Obsidian | Level 7

Hi @sld, I read some resources on multi-level model including the ones @Rick_SAS suggested. I think now I have a better understanding how the model should look like in my case. I have updated the question with a couple of new inquiries. Thanks.

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Either I'm not understanding your study design or it is not described correctly.

 

I'm looking at the sample dataset you attached. If variables C, D, E, and F are covariates measured at Level3, then I would expect to see the same value for each variable (C, D, E, or F) for all observations with the same value of Level3. For example, if Level3=1 then C=12 regardless of the values of Level2 and Level1. However, I see different values for C for Level3=1. Likewise, I would expect to see the same value for variable X or B for all observations with the same value of Level2. I'm perplexed.

 

IF your dataset had an appropriate multilevel structure (which I am not yet convinced of) and IF I correctly understand your design (which I also am not yet convinced of), then I would consider the following model:

 

PROC MIXED data=SampleData covtest;
  class level2 level3;
  model Y= C D E F Xmean Bmean X B / ddfm = SATTERTHWAITE s;
  random intercept X B / subject=level3 type=UN;
random intercept / subject=level2; run;

BUT I would think of this model as merely a first attempt, definitely not a final model. Even if this model is correct to some degree, there are many data characteristics and assumptions to be assessed (normality, homogeneity of variance, linearity, multicollinearity issues, which TYPE to use in the RANDOM statement, etc.).

 

If you want to respond to this message, please post a reply rather than edit your original message. It will be easier to track the discussion that way.

 

CuriousMind
Obsidian | Level 7

@sld You are right, my sample data was not structured correctly. I am really embarrassed and apologize for that. Please have a look at this data file now where I have designed it exactly how my data look.

 

My objective is to see how the exposure (X) contributes to response (Y). Fixed effect of X is the primary interest, however, I would like to examine whether the random intercept (and slope) for X varies by level2 and/or level3 ( the parsimonious model will be selected by some AIC/BIC criteria). The other variables are some socio-economic variables I just want to control for.

 

Grand mean centering can be done for my variables, but not group mean centering I assume as the variable will have only 0 values (for example, Xgroupmean=0 for all children). In my data, the raw score with a 0 has a significant meaning. Do I still need to center? Why did you center only level2 variables but not level3? I read in the literature that centering usually helps with multicollinearity. Would it be a good practice then to center all my variables at grand mean?

 

Checking assumptions is another issue I am having. Not much is talked about it for 3-level data, and certainly, very few literature who even test the assumption. I am researching how I can extend the macro for 2-level data in my case to test assumptions, and which assumptions are not met. Or, can I consider my data as a 2-level model since there are no level-1 covariates? This will significantly make my life easier.

 

I really appreciate your input @sld. Thank you. Please feel free to ask me any questions that are still unclear. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 3780 views
  • 7 likes
  • 4 in conversation