Mixed models are a sophisticated statistical technique that extend traditional linear models by incorporating both fixed and random effects. This allows for a more flexible analysis of data, particularly when dealing with complex datasets that have hierarchical or nested structures. SAS software provides powerful tools for fitting mixed models. Let’s explore the fascinating world of mixed model analysis with SAS! In this post, we will introduce you to the basics and provide examples to get you started.
What are Mixed Models?
Mixed models, also known as hierarchical linear models or multilevel models, are essential for researchers and analysts in the various fields including agriculture, biology, medicine, and social sciences. To name just a few!
Before we get to the examples, let’s define some key concepts. Mixed models can contain three different types of effects: Fixed effects, Random effects, and Repeated measures.
Fixed effects are those factors whose levels are selected by a nonrandom process or whose levels consist of the entire population of possible levels. These can either be main effects or interactions. For example, in a drug study, a researcher wants to compare the effect of three drugs (A, B, and C). Their interest is only in the comparison of these three drugs and knows what they are before the experiment begins. This makes drug a fixed effect in our problem. Another way to look at this is to imagine that a second researcher wants to replicate the original study. Would this second researcher need to use the same drugs as the first researcher? Yes. That also makes the variable drug a fixed effect.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
Random effects are factors that might have several levels, and the researcher selects a subset of the levels to include in the study. The inference from this analysis is directed towards the population of levels and not only the subset of levels included in the study. For example, in the previous drug study, four clinics were selected at random from a population of clinics in a region. The researcher wants to make an inference about the drug effects across the entire population of clinics, not just the ones in the study. Another way to look at this is to return to the second researcher replicating the original study. Would the second researcher need to use the same clinics as the first researcher? No. That makes the variable clinic a random effect.
Repeated measures manifest when data is collected from the same subject at multiple time points. Let’s look at the drug study again. Consider a scenario where the response was the change in a person’s blood pressure each hour for the eight hours following the drug administration. Clearly, the assumption of independence has been violated as we would assume that measurements taken from the same person would be more like each other than from different subjects. We would also assume that observations closer together in time for the same subject would be more similar. This would be a repeated measure.
Why use Mixed Models?
To achieve the best possible test for our fixed effects, we need to account for the additional source of variability that has been introduced into our problem from the random selection of levels of the random effect. We also would want to appropriately accommodate the breach of independence when that is part of the problem structure. What do I mean by additional variability in our problem? Let’s look at this example.
An engineer wants to test the strength of three adhesives that are used as bonding agents at a toy company. Seven toys are randomly selected from a population of toys and are used for the strength test. The brands of adhesives are a, b, and c. Each toy has three locations where the pieces are to be connected. Each toy has each adhesive used on a connection point. The amount of pressure that is required to break the bond is recorded.
Adhesive, our treatment effect, is a fixed effect because only three adhesives are used in the study. The engineer is interested in only making inferences about these three adhesives. The toy, a blocking effect, is a random effect because the seven toys are randomly selected from the population of toys. The inference about the treatment means is made over the entire population of the toys.
During the initial data exploration, we create the series plot above. Some would look at this image and say that they would use adhesive b since it has a breaking strength better than the rest over most of the toys. However, I want you to look at something else. Observe the difference in the breaking strength for toy 5. Next, consider the difference in the breaking strength for toy 7. This is a visualization of the additional variability that I was referencing. Failing to account for this additional variability will result in incorrect calculations of the denominators for the tests of fixed effects leading to incorrect p-values and decisions.
Key Features of Mixed Models in SAS
Flexibility – Mixed models can handle a wide range of data structures, including longitudinal data (repeated measures), nested data, and crossed random effects.
Correlation Structures – SAS allows users to specify various correlation structures for the random effects, providing a better fit for complex data. (This topic will need its own post.)
Variance Components – Mixed models can estimate variance components for different levels of the data hierarchy, helping to understand the sources of variability.
Model Diagnostics – SAS provides tools for assessing model fit, including residual plots and goodness-of-fit statistics.
PROC MIXED
PROC MIXED is used for fitting general linear mixed models. It allows for the inclusion of both fixed and random effects and provides a variety of options for specifying correlation structures.
Example 1: General Linear Mixed Model
Let’s start with an education example where we have collected gains in scores on a standardized test recorded for 1515 fourth-grade students in all schools in a district. Gain is defined as the score at the end of the year minus the score at the beginning of the year. The students’ genders and ethnicities and the identification numbers of the students’ teachers were recorded. In this setup, school, ethnicity, and gender will be fixed effects while teacher will be random as it represents a sample of the population of teachers who could teach at the schools.
proc mixed data=test_scores;
class school ethnicity gender teacher;
model gain = school ethnicity gender / solution;
random int / subject=teacher;
run;
In this example, test_scores is our data set. The variables school, ethnicity, gender, and teacher are defined to be categorical using the CLASS statement. On the MODEL line, gain is the response variable, and school, ethnicity, and gender are the fixed effects. The SOLUTION option requests for SAS to provide the parameter estimates for the fixed effects. The RANDOM line includes teacher as a random effect to account for individual teacher variability.
Example 2: Mixed Model with Random Slopes
Now, let’s consider a more complex example where both intercepts and slopes vary among individuals. This is called a random coefficient model. Suppose we are studying the effect of a treatment on patients’ blood pressure and we believe that both the baseline blood pressure and the intercept vary among patients.
proc mixed data=blood_pressure;
class patient treatment;
model bp = treatment baseline / solution;
random intercept baseline / subject=patient type=un;
run;
In this example, blood_pressure is our data set. The variables patient and treatment are defined to be categorical using the CLASS statement. On the MODEL line, bp is the response variable, and treatment and baseline are the fixed effects. This estimates the overall population intercepts and slopes. However, we believed that both the intercept and the slope for baseline vary among patients. The RANDOM statement would account for this randomness and allow us to even estimate the deviations of these intercepts and slopes for each patient. The TYPE= option allows us to suggest a working correlation structure, G matrix, for the problem.
PROC GLIMMIX
PROC GLIMMIX extends the capabilities of PROC MIXED by allowing for generalized linear mixed models (GzLMMs). This means that it can handle non-normal response distributions, such as binary or count data. In addition, we can also extend the modeling of the expected value of the response to the linear predictor using a link function. If this sounds familiar, we are grabbing attributes of PROC GENMOD and bringing them into the mixed model world.
Example 3: Adaptation of our General Linear Mixed Model to Logistic
Let’s return to our first example. Rather than focus on the actual grade that was achieved by the student, let’s have the response variable be an indicator of if the student passed or failed, passing, (1-passed, 0-failed). The predictors and the setup will be the same.
proc glimmix data=test_scores;
class school ethnicity gender teacher;
model passing = school ethnicity gender / dist=binary link=logit solution;
random int / subject=teacher;
run;
Note the similarity to the code block of example 1. In our GLIMMIX code, we have changed the response variable of the MODEL line to our new indicator variable passing. We have also added the dist= and link= options to the MODEL line. Our new response variable no longer follows a normal distribution. It is a dichotomous response and follows a binary distribution. This pushes our example into the world of logistic regression. The link function associated typically with logistic regression is the logit link function. Our random effect of student is still present in the RANDOM line.
Conclusion
Mixed model analysis provides a flexible approach for analyzing complex data structures, accommodating both fixed and random effects. SAS software offers powerful procedures to perform these analyses efficiently. By understanding the basic concepts and syntax, you can start exploring the potential of mixed models in your research. Be sure to check out the documentation of both PROC MIXED and PROC GLIMMIX for additional information and examples. Also check out the YouTube video made by John Gottula. Check out this paper that offers some tips and strategies for mixed modeling in SAS.
Feel free to experiment with different models and data structures to see how mixed models can enhance your analyses.
Find more articles from SAS Global Enablement and Learning here.
... View more