Paper 1146-2021
Author
Walter Stroup, University of Nebraska-Lincoln
Abstract
Over the past two decades, generalized linear models (GLMMs) mixed models for non-normal data such as proportions, counts, time to event, and so forth have become standard tools for statistical analysis. Since its introduction, PROC GLIMMIX has been the primary GLMM procedure for SAS/STAT(r). The most recent edition of SAS(r) for Mixed Models includes three chapters on using GLIMMIX for GLMMs. PROC GLIMMIX is an excellent frequentist tool. However, Bayesian approaches are becoming increasingly important. Many academic journals prefer some even require Bayesian analysis. Even when not required, Bayesian methods allow us to use what we know prior to, or in the early stages of, an investigation. PROC BGLIMM is a new SAS/STAT procedure that makes Bayesian implementation of GLMMs relatively easy. BGLIMM uses syntax similar to PROC GLIMMIX, but there are some differences. This tutorial presents what you need to know to get started using PROC BGLIMM. We will use GLMM examples from SAS for Mixed Models, but with a Bayesian twist.
Watch the presentation
Watch Bayesian Analysis of GLMMs Using PROC BGLIMM as presented by the author on the SAS Users YouTube channel.
Introduction
Note to readers: the following is as much of the proceedings paper as I was able to upload. The SAS programs, output and interpretations are here in more of less readable form. Equations did not upload, so you will see blanks where those are supposed to be. You can download the PDF file (Stroup-1146.pdf), which has the paper as I wrote it in intact form, or the presentation slides (140530_SASGF2021_BLGLIMM_PDF_version.pdf)
Mixed models are important tools for analyzing data from many types of studies, including longitudinal or repeated measures, multi-level or split-plot experiments, blocked designs with incomplete blocks or missing data and multi-location studies. Linear mixed models (LMMs) accommodate response variables assumed to follow a normal (hereafter referred to as a Gaussian) distribution. Generalized linear mixed models (GLMMs) extend mixed model theory and methods to accommodate non-Gaussian responses such as categorical or count data. Because the LMM is a special case of the GLMM – a GLMM with Gaussian data – the acronym GLMM is used for the rest of this paper. In the SAS ® system, PROC GLIMMIX is the preeminent mixed model procedure, allowing users to work with both LMMs and GLMMs. PROC GLIMMIX uses a frequentist approach to estimation and inference. The next section, “GLMM Basics” gives an overview of what this entails, but the basic idea is that frequentist statistics focus on obtaining estimates and standard errors that are used to construct test statistics for significance testing or to construct confidence intervals.
Bayesian statistics provide an alternative approach. Bayesian statistics focus on incorporating prior information to obtain posterior distributions of the statistics of interest. Posterior distributions combine what you know in advance – the prior – with the data you observe. Bayesian statistics are becoming increasingly important for data analysis. One reason is that many academic journals now discourage classical significance testing in favor of Bayesian analysis. Some journals have even banned significance tests, in effect requiring Bayesian analysis. The second important reason is that the statistics obtained from frequentist methods are essentially equivalent to Bayesian analysis with a “non-informative” prior. In other words, frequentist methods implicitly assume that you know nothing until you analyze the data. In reality, this is rarely, if ever, true (if you really know nothing before analyzing the data, you probably should not be doing the study!). Many data sets come from studies that may be part of a series – phases of clinical trials leading to the licensing of a pharmaceutical product – or similar experiments by graduate students with the same advisor.
PROC MCMC is the SAS system’s original and all-purpose Bayesian procedure. Although you can use PROC MCMC for mixed models, the syntax is similar to PROC NLMIXED, which can be inconvenient for less-than-simple models or models with CLASS variables. PROC BGLIMM was developed to allow you to use syntax similar to PROC GLIMMIX, making Bayesian analysis more accessible for GLMMs.
This tutorial introduces PROC BGLIMM using three examples from SAS for Mixed Models: Introduction and Basic Applications (Stroup, et al. 2018).
A multi-clinic trial with binomial data from Chapter 11 of SAS for Mixed Models.
A multi-level (split-plot) experiment with count data from Chapter 13.
Repeated measures (longitudinal) data from Chapter 8.
The next section called “GLMM Basics,” covers the GLMM setting, essential definitions and terminology, and an overview and comparison of PROC GLIMMIX and PROC BGLIMM estimation and inference.
GLMM BASICS
The classical format for a statistical model is the equation
Response variable = systematic component + random component
Think of the systematic component as the fixed effects, e.g., in ANOVA-type models with treatment effects or in linear regression models. In mixed models, the random component is really two distinct components, random model effects such as block effects, and residual effects, such as those that account for serial covariance in a longitudinal mixed model. The classical response = fixed + random format works well if you can assume that the data follow a Gaussian distribution, but in many cases this is not true. Table 1 shows common types of response variables in the left-hand column and types of model effects across the top row.
Response Variable
Fixed
Random
Categorical (CLASS)
Continuous
Model effect
Residual / Covariance structure
Gaussian
“R-side”
Categorical
binomial
multinomial
“G-side”
Continuous proportion
beta
Count
Poisson
negative binomial
time-to-event
etc...
Table 1. Response variable and model effect types covered by LMM and GLMM
Replacing the classical model format, the defining elements of the GLMM are as follows:
Distributions:
; denotes some distribution, e.g., one of those listed above
, , ,
, , is scale matrix, e.g., or residual covariance
Link function:
Linear Predictor:
PROC MIXED allows you to work in the first row of Table 1. PROCs BGLIMM, GLIMMIX, MCMC and NLMIXED allow you to work with all of the rows.
FREQUENTIST GLMM ESTIMATION AND INFERENCE
The guiding principle of GLMM estimation is maximum likelihood (ML). You obtain estimates of the fixed effects by maximizing the log-likelihood, , where = . In general, this integral is intractable.
SAS software uses two approximation strategies: linearization (pseudo-likelihood) and integral approximation (quadrature and Laplace). The mixed model equations for Gaussian data, the PROC MIXED default used to obtain REML estimates of the variance components and ML solutions for and , are special cases of pseudo-likelihood (PL). The REML version of PL is the PROC GLIMMIX default. Laplace and quadrature are PROC GLIMMIX options. PROC NLMIXED uses quadrature only.
Estimates and standard errors computed from PL or integral approximation are used to compute test statistics, p-values and confidence intervals. Classical frequentist inference makes extensive use of significance testing.
BAYESIAn estimation and inference
The primary tool of Bayesian inference is the posterior distribution, 𝑓(𝛽, 𝜎|𝑦, 𝑏) = 𝑓(𝑦,𝑏|𝛽,𝜎)𝑓(𝛽,𝜎) ∬ 𝑓(𝑦,𝑏|𝛽,𝜎)𝑓(𝛽,𝜎)𝑑𝛽𝑑𝜎 where 𝜎 denotes the vector of covariance parameters and 𝑓(𝛽, 𝜎) denotes the prior distribution of the fixed effects and covariance parameters. The function 𝑓(𝑦, 𝑏|𝛽, 𝜎) is the same likelihood used in frequentist estimation, defined by the GLMM distributions 𝑓(𝑦|𝑏) and 𝑓(𝑏).
As with the GLMM likelihood, the integrals required for the posterior distribution are generally intractable. Unlike frequentist estimation, neither linearization nor integral approximation are viable options. Instead, Bayesian methods approximate the posterior distribution by simulation.
PROC BGLIMM involves the following steps:
Specify the GLMM (i.e., the defining elements listed above). This step specifies 𝑓(𝑦|𝑏) and 𝑓(𝑏).
Specify the prior distributions 𝑓(𝛽, 𝜎). The BGLIMM procedure uses pre-programmed default priors. Depending on the model and data, you may or may not need to replace them with more appropriate prior distributions.
PROC BGLIMM implements three computational steps. It is beyond the scope of this tutorial to provide technical description of the algorithms used to implement these steps. The following is simply a list of the steps and their purpose:
Tuning. This step uses the data, model and priors to come up with an approximation of the posterior distributions to be sampled.
Burn-in. This is initial sampling the posterior distributions. A coin flip exercise in introductory statistics class provides simple analogy: the proportion of flips 4 resulting in heads vacillates early in the sampling, but eventually settles down to a reasonable approximation of the probability of a head.
Sampling the posterior distribution. As the name implies, the posterior distribution of each parameter is sampled with the goal of getting a sufficiently accurate and detailed characterization.
The end result is a mean, standard deviation and quantiles of interest for the posterior distribution of each element of 𝛽 and each covariance parameter. You can use BGLIMM’s ESTIMATE statement to define estimable functions of the 𝛽 that address objectives of your data analysis. Two forms of credible intervals are commonly used for Bayesian inference:
the highest posterior density (HPD) interval. This is the narrowest interval between the lower and upper bound that contains a given percent of the posterior distribution.
quantile-based interval. For example, you can construct a 95% credible interval using the 2.5 and 97.5 percentiles of the posterior distribution as the lower and upper bounds.
There are a number of diagnostics you should check before using PROC BGLIMM results. Diagnostics are introduced in the context of the examples in the following sections. These examples cover problems you are likely to encounter, how to interpret the relevant diagnostics, and PROC BGLIMM options you can use to address these problems in order to obtain a useable analysis.
EXAMPLE 1: MULTI-CLINIC TRIAL WITH BINOMIAL DATA
The first example is the Beitler and Landis (Biometrics, 1985) multi-clinic data set that appears in SAS for Mixed Models (2018), Chapter 11, Section 11.3. Two treatments, CNTL and DRUG, are compared at eight clinics sampled from a target population. At the clinic, patients are assigned to the treatment. The response variable, denoted , is the number of patients having a favorable outcome.
The first step in the analysis of these data is to identify an appropriate statistical model. You can do this by following a process presented in SAS for Mixed Models, Chapter 5. List the sources of variation separately for the “study” design and the treatment design. The study design, also called the “experiment design,” describes the components of the design before being assigned treatments or treatment levels. In this case, the study design consists of clinics and groups of patients within clinics. The treatment design is simply CNTL and DRUG. Groups are randomly assigned to treatments. Table 2 shows the sources of variation.
STUDY DESIGN
TREATMENT DESIGN
COMBINED
SOURCE
DF
SOURCE
DF
SOURCE
DF
clinic
7
clinic
7
treatment
1
treatment
1
group(clinic)
8
group(clinic) | treatment
a.k.a. clinic x treatment
a.k.a. unit of observation
8-1=7
TOTAL
15
TOTAL
15
Table 2. Sources of Variation for Multi-Clinic Example Data
In the combined column, read “group(clinic) | treatment” as “group within clinic after accounting for treatment.” List the rows so that treatment appears in the row immediately above the unit to which it is assigned – in this case, group(clinic).
The next step is to write model effects associated with each source of variation and the assumed probability distribution of any effects considered to be random. Tables 3 and 4 show two scenarios that lead to plausible mixed models. Table 3 describes the elements of a logit-normal GLMM, so-called because it the uses a logit link function, , where denotes the probability of a favorable outcome for the clinic-treatment combination, and all random effects are assumed to follow a normal (Gaussian) distribution. Note that clinics effects are random because the clinics in the study are a representative sample. Also note that there is a random effect for the group level source of variation. You should include this term to account for variation in at this level – if you don’t, over-dispersion may cause misleading results.
Note that group(clinic) is the unit on which the observations ( ) are taken. Hence, you can refer to it as the “unit level” effect, unit being shorthand for unit of observation. Specifying the model requires stating the assumed distribution of the observations, conditional on the random effects, at the unit level, and relating its expected value to the linear predictor, e.g., through the inverse link function as shown here.
SOURCE
EFFECT DISTRIBUTION
OBSERVATION AND MODEL
clinic
treatment
group(clinic) | treatment
a.k.a. clinic x treatment
a.k.a. unit of observation
Table 3. Model Effects and Distributions Defining Logit-normal GLMM
Table 4 gives the elements of a beta-binomial mixed model, a commonly used alternative to the logit-normal GLMM for binomial data. The difference between the two models is the way group-level variation in the probability of a favorable outcome is modeled. The logit-normal GLMM includes a random effect in the linear predictor. The beta-binomial models the probability as a random variable, assuming that it follows a beta distribution. Table 4 gives the Ferrari and Cribari-Neto (2004) parameterization of the beta in terms of its mean ( ) and scale parameter ( ). Alternatively, you can write the distribution in its conventional math-stat form, , where and .
SOURCE
EFFECT DISTRIBUTION
OBSERVATION AND MODEL
clinic
treatment
group(clinic) | treatment
a.k.a. clinic x treatment
a.k.a. unit of observation
Table 4. Model Effects and Distributions Defining Beta-Binomial Mixed Model
You can use PROC BGLIMM (or PROC GLIMMIX) to implement the logit-normal GLMM, but not the beta-binomial. This is because BGLIMM and GLIMMIX are limited to models with Gaussian random effects, whereas the beta-binomial has a non-Gaussian random effect, . (Although beyond the scope of this tutorial, you can implement the beta-binomial with PROC MCMC)
The beta-binomial is shown here to illustrate the difference between a “sensible” model and an improperly specified model. What constitutes a “sensible” model? You must have a one-to-one correspondence between sources of variation and model effects or parameters that account for them. Table 5 illustrates the difference.
LOGIT-NORMAL
BETA-BINOMIAL
NAIVE GLMM
BETA-BIN W/
SOURCE
sensible
sensible
Over-Dispersion
Unit Confounding
clinic
treatment
unit
,
Table 5. Sensible and Improperly Specified Mixed Models for Multi-Clinic Binomial Data
Both the logit-normal and beta-binomial meet the sensible model criterion. The “naive GLMM” lacks any term that accounts for variation at the unit level, making the model vulnerable to over-dispersion. The effect is thus needed for the logit-normal model, but if you include it in the beta-binomial model, it is confounded with the beta distribution’s scale parameter ( ), which is not good.
The remainder of this section focuses on implementing the logit-normal model with PROC BGLIMM.
BASIC PROC BGLIMM STATEMENTS FOR THE LOGIT-NORMAL bglimm
PROC BGLIMM uses syntax borrowed from PROC GLIMMIX and, for repeated measures, from PROC MIXED. To illustrate, here are the basic GLIMMIX and BGLIMM statements. First, the GLIMMIX statements for the logit-normal GLMM from SAS for Mixed Models:
proc glimmix data=multi_clinic;
class clinic treatment;
model fav/nij = treatment;
random intercept treatment / subject=clinic;
lsmeans treatments / ilink diff oddsratio cl;
run;
Now the BGLIMM statements:
proc bglimm data=multi_clinic plots=(trace autocorr density)
diagnostics=all outpost=cout;
class clinic treatment;
model fav/nij = treatment / init=pinit;
random intercept treatment / subject=clinic;
estimate “CNTL” intercept 1 treatment 1 0;
estimate “DRUG” intercept 1 treatment 0 1;
estimate “log_odds_ratio” treatment 1 -1;
ods output estimates=model_scale;
run;
Notice that the CLASS, MODEL and RANDOM statements for the two procedures are identical. The FAV/NIJ (events/trials) syntax is specific to the binomial distribution. Also notice that there are no statements in the basic PROC BGLIMM program specifying prior distributions. BGLIMM has pre-programmed priors that it uses by default. These may or may not be appropriate, depending on the model and data. The additional options in the PROC BGLIMM statement for PLOT and DIAGNOSTICS should be considered standard operating procedure to either verify that the results are useable or to identify problems that need to be fixed, such as inappropriate default priors, before the output can be regarded as useable. The INIT=PINIT option in the MODEL statement causes the default starting values used for the fixed effects to be included is the SAS listing. Consider these to be part of the diagnostics. PROC BGLIMM does not have an LSMEANS statement. You must write the ESTIMATE statements that correspond to least squares means and treatment differences. PROC BGLIMM also lacks an ILINK option. PROC BGLIMM only computes model-scale (in this case, logit scale) estimates. The OUTPOST option in the PROC statement and the ODS OUTPUT statement give you two ways to obtain data-scale statistics. These are explained below in the subsection entitled “Post-Processing to Obtain Data-Scale Statistics.”
DIAGNOSTICS
The PLOT, DIAGNOSTICS=ALL and INIT=PINIT produce items in the SAS listing the help you identify problems and to decide if the results are useable. This section has two parts: a list of the items and a brief description of their purpose, and diagnostics produced by the default PROC BGLIMM statements given above and their interpretation.
List of Diagnostics
The PLOT option gives you three plots: autocorrelation, density and trace. Trace plots are also called “caterpillar” plots, because ideally they should look like a “fuzzy caterpillar,” which indicates the sampling algorithm has produced a reasonable approximation of the posterior distribution. The algorithms used to sample the posterior distributions are vulnerable to autocorrelation. The autocorrelation plot shows how quickly autocorrelation decreases as lag increases. The density plot, as the name implies, shows the posterior density produced by the sampling algorithm.
DIAGNOSTICS=ALL produces the following:
Effective Sample Size (ESS). Autocorrelation reduces posterior sampling efficiency. ESS gives a measure of the sample size after adjusting for autocorrelation. Low efficiency, per se, is not a problem, but ESS should be, at the very least, >1000 in order to have an accurate approximation of the posterior distribution.
Heidelberger-Welch. These are stationarity tests – is the posterior distribution sampling giving consistent results from beginning (immediately after burn-in) to end? The key columns in the SAS listing will say “passed” or “failed.”
Raftery-Lewis. Addresses the accuracy of the posterior distribution’s percentile estimates. The key statistic is the dependence factor. Ideally, it should be @ 1.
Geweke. Compares estimates early and late in the sampling process. Use the Geweke statistics to test the null hypothesis that they are acceptably similar. You want to see p-values consistent with failing to reject the null hypothesis.
The listing also includes the number of burn-in iterations, the posterior density sample size and the priors the BGLIMM procedure uses by default.
Results of Preliminary PROC BGLIMM Run with Default Settings
Figure 1 shows two diagnostic plots. On the left is the plot for the logit-normal GLMM’s intercept parameter produced by BGLIMM program given above. On the right is an example of what the plot should look like in a run whose results are useable. The plots for the other parameters are not shown, but they are similar to these.
Figure 1. Example Diagnostic Plots
The left-hand plot is what you do not want to see. The trace plot shows erratic and inconsistent sampling of the posterior distribution. It looks more like a seismograph during an earthquake than the “fuzzy caterpillar” plot on the right. The right-hand plot shows consistent sampling from beginning to end. The autocorrelation plot on the left also signals trouble. Autocorrelation should drop to zero quickly, as it does by lag 10 in the right-hand plot. The BGLIMM run with default settings produces autocorrelation that never goes to zero, even at lag 50. If autocorrelation does not drop to zero at least by lag 25, consider the results unusable.
Output 1 shows results produced by the DIAGNOSTICS=ALL option. Statistics that signal trouble are highlighted in bold.
Output 1. Diagnostic Statistics from PROC BGLIMM Logit-normal Program with Default Settings
Effective Sample Sizes
Parameter
ESS
Autocorrelation Time
Efficiency
Intercept
174.1
28.7273
0.0348
trt cntl
301.6
16.5782
0.0603
trt drug
.
.
.
Random VC(1)
886.7
5.6389
0.1773
Random VC(2)
722.3
6.9225
0.1445
Geweke Diagnostics
Parameter
z
Pr > |z|
Intercept
-0.4815
0.6302
trt cntl
-0.2990
0.7649
trt drug
.
.
Random VC(1)
0.5412
0.5884
Random VC(2)
0.9604
0.3369
Raftery-Lewis Diagnostics
Quantile=0.025 Accuracy=+/-0.005 Probability=0.95 Epsilon=0.001
Parameter
Number of Samples
Dependence Factor
Burn-In
Total
Minimum
Intercept
29
32909
3746
8.7851
trt cntl
18
19114
3746
5.1025
trt drug
.
.
.
.
Random VC(1)
4
4636
3746
1.2376
Random VC(2)
4
4714
3746
1.2584
Heidelberger-Welch Diagnostics
Parameter
Stationarity Test
Half-Width Test
Cramer-von Mises Stat
p-Value
Test Outcome
Iterations Discarded
Half-Width
Mean
Relative Half-Width
Test Outcome
Intercept
0.0951
0.6092
Passed
0
0.0875
-0.4288
-0.2041
Failed
trt cntl
0.2343
0.2098
Passed
0
0.0630
-0.9815
-0.0641
Passed
trt drug
.
.
.
.
.
.
Random VC(1)
0.2008
0.2659
Passed
2000
0.0655
1.8446
0.0355
Passed
Random VC(2)
0.0432
0.9160
Passed
0
0.0855
1.2928
0.0661
Passed
The “Effective Sample Sizes” table ESS and “Efficiency” results indicate trouble. All ESS values are less than 1000. The Geweke statistics look okay here, but you should always check them. The Raftery-Lewis dependence factors for “Random VC(1)” and “VC(2),” the clinic and clinic x treatment variance ( and ) respectively, are acceptably close to one, but those for the intercept and “trt cntl” effects ( and ) are much too high. Rules of thumb being at best approximate, if you see dependence factors greater than 3 or 4, consider your results to be unusable. Finally, the Heidelberger-Welch half-width test for intercept failed.
ADDRESSING PROBLEMS
The problems identified by the above diagnostics can occur for several reasons. The number of burn-in iterations or the size of the posterior distribution sampling may be inadequate. You can use thinning, that is, only taking every 5 th or 10 th or even 50 th posterior density sample instead of every sample, to reduce autocorrelation. The default starting values for the fixed effect parameters may not be appropriate. The default priors may be too diffuse, including values that are technically in the parameter space, but highly implausible. Or the default priors may be where the parameters are not, that is, the most likely values of the parameter may be in an extreme tail of the prior distribution. Output 2 shows the defaults used by PROC BGLIMM for the logit-normal GLMM.
Output 2. Sampling, Thinning, Starting Values and Priors for Default Logit-normal BGLIMM Run
Model Information
Burn-In Size
500
Simulation Size
5000
Thinning
1
Initial Values for Fixed Effects
Parameter
Value
Intercept
-0.3102
trt cntl
-0.4040
Priors for Fixed Effects
Parameter
Prior
Intercept
Constant
trt cntl
Constant
Priors for Scale and Covariance Parameters
Parameter
Prior
Random Cov (Diag)
Inverse Gamma (Shape=2, Scale=2)
The “Model Information” table shows the default burn-in size (number of burn-in iterations), the “simulation size” (number of samples of the posterior density) and thinning. Thinning equal one means no thinning was done. The “Initial Values for Fixed Effects” table shows the starting values for and . Compare these to their estimates from PROC GLIMMIX, and . You can use the estimates from GLIMMIX as starting values instead of letting BGLIMM’s algorithm choose starting values.
The default priors for and are “constant” – essentially uniform distributions between plus and minus infinity. Overly diffuse priors that include highly implausible parameter values can cause problems – given that the actual values of and are unlikely to be very far from their GLIMMIX estimates, the “constant” prior may be an issue. The prior for both variance components is an inverse gamma with shape and scale parameters equal to two. Figure 2 shows a plot of the inverse gamma(2,2) distribution with vertical lines showing the GLIMMIX estimates of and .
Figure 2. Plot of PROC BGLIMM Default Prior for Variance Components
The lower vertical bar is the GLIMMIX estimate of the clinic x treatment variance, . The upper vertical bar is the GLIMMIX estimate . You can see that the most likely value of the clinic x treatment variance is in the extreme lower tail of the default prior. Most of the probability mass of the inverse gamma(2,2) distribution is located over values that are implausibly high for . The clinic variance shows the opposite problem, although to a lesser extent – its most likely value is in the upper tail of the default prior.
Christiansen, et al. (2011) suggest a strategy for selecting an appropriate prior for a variance component. They suggest using precision, defined as the inverse of the variance, denoted here as , and identifying a gamma distribution whose mode is approximately equal to the most likely value of the parameter being estimated (you can use the estimates from PROC GLIMMIX as a logical most likely value) and whose upper and lower extreme quantiles (say the 1 st and 99 th ) correspond to values below and above which are thought to be highly unlikely. The mode of a gamma distribution with shape parameter and scale parameter is . You can specify the mode, try several values of , solve for and evaluate the resulting distribution.
The following is an example a program you can use for prior hunting. The example is for the clinic x treatment variance. The PROC GLIMMIX estimate using the pseudo-likelihood default is . The estimate using GLIMMIX’s METHOD=QUADRATURE option is . These translate to precisions of 17 and 100 respectively. The program uses a mode between the two, in this example 40. Use the PDF and CDF functions to create the distribution for each combination. The PROC PRINT statement allows you to examine the lower and upper quantiles of the candidate distributions. The PROC GPLOT statement allows you to visualize the candidate distributions.
The program statements are as follows:
data gamma_prior;
mode=40;
/* A=shape parameter */
/* B=scale parameter */
do a=1.01,1.05,1.1,1.25,1.5,2,3,5,10,20;
b=mode/(a-1);
do precision=0 to 200;
pdf_gamma=pdf("gamma",precision,a,b);
cdf_gamma=cdf("gamma",precision,a,b);
output;
end;
end;
proc sort data=gamma_prior; by a;
proc print;
where (0.009<cdf_gamma<0.011 or 0.989<cdf_gamma<0.991);
run;
proc gplot data=gamma_prior;
by a;
plot pdf_gamma*precision/href=17,40,100;
plot cdf_gamma*precision/vref=0.01,0.99;
run;
Some trial and error, and art along with science, are involved at this point. The distribution selected is gamma with shape=3 and scale=20. Figure 3 shows plots of the p.d.f. and c.d.f.
Figure 3. Plots of the Prior Distribution Selected for the Clinic x Treatment Variance
You can see that p.d.f. has a mode of 40 and contains the precision values 17 and 100 well within its range. The c.d.f. plot shows that the lower and upper extreme quantiles are just under 10 and just under 200, respectively. Translated to the variance scale, this prior covers a range between to just over 0.1, the range considered plausible. Using the same strategy, a suitable prior for the clinic precision is gamma with shape=1.5 and scale=1.
REVISED PROC BGLIMM PROGRAM
The following shows the PROC BGLIMM program modified to include the options needed to obtain useable results. These options appear in bold. The program:
proc bglimm data=clinics seed=81152097
nbi=2500 nmc=100000 thin=10
plots=(trace autocorr density) diagnostics=all
statistics(percent=(2.5 50 97.5))
outpost=cout;
class clinic trt;
model fav/Nij=trt /
init=(list=(-0.46 -0.75) pinit) coeffprior=normal(var=9);
random intercept / subject=clinic covprior=igamma(shape=1.5, scale=1);
random trt / subject=clinic covprior=igamma(shape=3, scale=0.05);
estimate "CNTL" intercept 1 trt 1 0;
estimate "DRUG" intercept 1 trt 0 1;
estimate "log odds-ratio" trt 1 -1;
ods output estimates=modelscale;
title "BGLIMM - all needed options";
run;
The NBI, NMC and THIN options increase burn-in iterations, posterior distribution sample size and thinning, respectively. The INIT=(LIST=( option allows you to specify starting values for the fixed effect parameters and . The COEFFPRIOR option specifies a prior distribution for these effects. Note that your options for COEFFPRIOR are either “constant” or “normal.” VAR allows you to specify the variance, but the normal prior is always centered at a mean of zero. The COVPRIOR option specifies a prior for the variance of random model effects. Notice that you can have multiple RANDOM statements, which allows you to use different priors for each variance. The COVPRIOR must be specified in terms of the inverse gamma distribution. The required syntax is IGAMMA (SHAPE= , SCALE= ). Use the following result: is equivalent to . For example, the prior for the clinic x treatment precision translates to an prior for the clinic x treatment variance. Output 3 shows selected results.
Effective Sample Sizes
Parameter
ESS
Autocorrelation Time
Efficiency
Intercept
2952.3
3.3872
0.2952
trt cntl
10000.0
1.0000
1.0000
trt drug
.
.
.
Random1 Var
7644.9
1.3081
0.7645
Random2 Var
8756.1
1.1421
0.8756
Raftery-Lewis Diagnostics
Quantile=0.025 Accuracy=+/-0.005 Probability=0.95 Epsilon=0.001
Parameter
Number of Samples
Dependence Factor
Burn-In
Total
Minimum
Intercept
6
6350
3746
1.6951
trt cntl
2
3834
3746
1.0235
trt drug
.
.
.
.
Random1 Var
2
3803
3746
1.0152
Random2 Var
2
3650
3746
0.9744
Posterior Summaries and Intervals
Parameter
N
Mean
Standard Deviation
95% HPD Interval
Intercept
10000
-0.4491
0.5504
-1.6003
0.5728
trt cntl
10000
-0.7454
0.3053
-1.3201
-0.1303
trt drug
0
.
.
.
.
Random1 Var
10000
2.1165
1.5208
0.3744
4.6572
Random2 Var
10000
0.0261
0.0281
0.00444
0.0639
Posterior Summaries
Parameter
N
Mean
Standard Deviation
Percentiles
2.5
50
97.5
Intercept
10000
-0.4491
0.5504
-1.5574
-0.4418
0.6304
trt cntl
10000
-0.7454
0.3053
-1.3539
-0.7410
-0.1619
trt drug
0
.
.
.
.
.
Random1 Var
10000
2.1165
1.5208
0.6293
1.7386
5.7644
Random2 Var
10000
0.0261
0.0281
0.00701
0.0193
0.0845
Results from ESTIMATE Statements
Label
Mean
Standard Deviation
95% HPD Interval
CNTL
-1.1946
0.5612
-2.3136
-0.0881
DRUG
-0.4491
0.5504
-1.6003
0.5728
log odds-ratio
-0.7454
0.3053
-1.3201
-0.1303
Output 3. Selected Results from Useable PROC BGLIMM Run for Beitler-Landis Data
The “Effective Samples Sizes” table shows acceptable ESS and efficiency numbers. All Raftery-Lewis dependence factors are <2. In the interest of space, the other diagnostics are not shown here, but they are similarly acceptable, indicating that this run has produced useable results. The “Posterior Summaries and Intervals” table shows the mean, standard deviation and 95% highest posterior density (HPD) intervals for the model fixed effects and the variance components. The estimates in the “Mean” column are similar to those obtained with PROC GLIMMIX. Note that the fixed effect component of the linear predictor, is not full rank. PROC BGLIMM uses the same convention – setting the last effect, in this case , to zero – as other SAS linear model procedures. The “Posterior Summaries” table gives percentiles specified by the STATISTICS(PERCENT= option instead of HPD intervals. The “Results from ESTIMATE Statements” table give the posterior distribution mean, standard deviation and 95% HPD intervals from the ESTIMATE statements.
Note that all of these statistics are on the model scale – in this case the logit scale. Because PROC BGLIMM has no ILINK option, you must output these results and use post-processing steps to obtain data scale estimates. The most obvious of these are the probabilities of a favorable outcome for each treatment, and , the difference between the two, , and the odds-ratio, . There are two ways to do this. These are shown in the next section.
post-processing to obtain data-scale estimate
You can either use the data set produced by the OUTPOST option in the PROC statement or the ESTIMATES from the ODS OUTPUT statement to obtain data-scale estimates.
Data Scale Estimates Using OUTPOST and the %SUMINT Macro
After you run the PROC BGLIMM program that produces useable output, use the following statements:
data datasc;
set cout;
pr_cntl=logistic(cntl);
pr_drug=logistic(drug);
ProbDiff=logistic(drug)-logistic(cntl);
OddsRatio=exp(log_odds_ratio);
run;
%sumint(data=datasc, var=pr_cntl: pr_drug: ProbDiff: OddsRatio)
The OUTPOST option creates a data set of posterior samples for the model fixed effects and ESTIMATES. In this example, the data set is called COUT. The DATA step creates a new data set, DATASC, with the favorable outcome probabilities, labelled PR_CNTL and PR_DRUG, their difference, PROBDIFF, and the odds-ratio. The LOGISTIC function implements the logit’s inverse link, . The difference estimates the log of the odds-ratio, so gives you the estimated odds-ratio. The %SUMIT macro computes and prints inferential statistics for the terms listed after VAR= for the data set given by DATA=<data set name>. Output 4 shows the results.
Posterior Summaries and Intervals
Parameter
N
Mean
Standard Deviation
95% HPD Interval
OddsRatio
10000
0.4970
0.1532
0.2251
0.8004
pr_cntl
10000
0.2461
0.0990
0.0750
0.4475
pr_drug
10000
0.3965
0.1224
0.1679
0.6394
ProbDiff
10000
0.1504
0.0667
0.0248
0.2800
Output 4. Data Scale Estimates from OUTPOST and %SUMINT
Data Scale Estimates Using ODS OUTPUT
Instead of using the OUTPOST data set and the %SUMINT macro, you can use the following statements:
data lsm; set modelscale;
if label='CNTL' or label='DRUG';
prob=1/(1+exp(-Mean));
lower=1/(1+exp(-HPDLower));
upper=1/(1+exp(-HPDUpper));
data oddsratio; set modelscale;
if label='log odds ratio';
OddsRatio=exp(Mean);
lower=exp(HPDLower);
upper=exp(HPDUpper);
proc print data=lsm;
proc print data=oddsratio;
run;
MODELSCALE is the data set produced by the ODS OUTPUT statement. These DATA steps produce two data sets. The one called LSM uses the logit’s inverse link to compute the estimates and HPD interval lower and bounds for and . The data set called ODDSRATIO computes the estimates and HPD bounds for the odds-ratio. Use the IF LABEL= statement to select items to be included in each data set. The LABEL names are the labels used in the ESTIMATE statements. Both data sets use the results of the ESTIMATE statements instead of the posterior sample data set, so the results differ from the %SUMINT listing above. If you want the estimated difference between the probabilities, use OUTPOST and %SUMINT – it is more convenient. Output 5 shows the results using the ODS OUTPUT approach.
Obs
Label
Mean
StdDev
HPDLower
HPDUpper
prob
lower
upper
1
CNTL
-1.1946
0.5612
-2.3136
-0.0881
0.23245
0.09001
0.47798
2
DRUG
-0.4491
0.5504
-1.6003
0.5728
0.38957
0.16793
0.63942
Obs
Label
Mean
StdDev
HPDLower
HPDUpper
OddsRatio
lower
upper
1
log odds-ratio
-0.7454
0.3053
-1.3201
-0.1303
0.47452
0.26712
0.87780
Output 5. Data Scale Estimates from ODS OUTPUT and Follow-up DATA Steps
The Mean, StdDev, HPDLower and HPDUpper values are model scale values from the BGLIMM ESTIMATE statements. The three columns on the right, labelled “prob” or “OddsRatio” and “lower” and “upper” are the data scale estimates.
EXAMPLE 2: MULTI-LEVEL DESIGN WITH COUNT DATA
This example appears in SAS for Mixed Models (2018) in Chapter 13, Section 13.3. The data are from an agricultural experiment comparing two cultivation methods and two seed mixes. Each of six fields is divided into two sections, called whole plots. Each whole plot is randomly assigned to a method so that both methods appear at each field. Each whole plot is divided into two split plots to which mixes are randomly assigned. Table 6 shows the sources of variation.
EXPERIMENT DESIGN
TREATMENT DESIGN
COMBINED
SOURCE
DF
SOURCE
DF
SOURCE
DF
field
5
field
5
method
1
method
1
whole-plot(field)
6
field x method (wp)
6-1=5
mix
1
mix
1
method x mix
1
method x mix
1
split-plot(wp)
12
sp (wp) | method, mix
12-2=10
TOTAL
23
TOTAL
23
Table 6. Sources of Variation for Multi-Level Field Trial
The response variable is a discrete count. SAS for Mixed Models describes two “sensible” models, the Poisson-normal GLMM and the negative binomial GLMM. This section presents the latter, as it is the model of choice for these data, and the negative binomial illustrates PROC BGLIMM options that would not be used with the Poisson-normal model.
Table 7 uses the combined sources of variation column from Table 6 to show how the negative binomial arises in the context of this experiment.
SOURCE
EFFECT DISTRIBUTION
OBSERVATION AND MODEL
field
method
field x method (wp)
mix
method x mix
sp(wp)|method,mix
a.k.a. unit of obs.
Table 7. Model Effects and Distribution Defining Poisson-Gamma Process
The method and mix effects are denoted by and , respectively. The negative binomial arises from a Poisson-gamma process. Variation at the unit of observation level is assumed to follow a gamma distribution with If you integrate out the unit level term, , where NB denotes negative binomial. Thus, the following elements define the GLMM:
Distributions: , ,
Link function:
Linear predictor:
With PROC BGLIMM, replacing the effects part of the linear predictor, , with its cell means, full rank equivalent makes it easier to use the ESTIMATE statement to compute terms of interest. The cells means form can be written .
The CLASS, MODEL and RANDOM statements for this model are:
class field method mix;
model count=method*mix / noint distribution=negbin;
random intercept method/ subject=field;
As with Example 1, the default BGLIMM settings give unusable results. In the interest of space, not all diagnostics are shown. Output 6 shows two examples, the Geweke statistics and the plots for the negative binomial scale parameter, .
Geweke Diagnostics
Parameter
z
Pr > |z|
method 1*mix 1
1.8922
0.0585
method 1*mix 2
2.3868
0.0170
method 2*mix 1
0.9584
0.3379
method 2*mix 2
1.8992
0.0575
Scale
-2.7547
0.0059
Random VC(1)
1.4784
0.1393
Random VC(2)
-0.4486
0.6537
Output 6. Example diagnostics: PROC BGLIMM negative binomial GLMM with default settings
Four of the seven parameters on the “Geweke Diagnostics” table show low p-values. For the scale parameter, p<0.01. The trace and autocorrelation plots are unacceptable. To obtain useable results, you must work through the same set of steps as illustrated in the previous section. The following program statements are the result:
proc bglimm data=sp_counts plots=(trace autocorr density)
nbi=2500 nmc=1000000 thin=100 seed=20210209
diagnostics=all dic outpost=model_scale;
class field method mix;
model count=method*mix / noint distribution=negbin
init=(list=(2.5, 2.9, 0.8, 2.0) Pinit)
scaleprior=gamma(shape=20,iscale=25);
random intercept method/ subject=field
covprior=igamma(shape=4,scale=3);
estimate "LSM_11" method*mix 1 0 0 0;
estimate "LSM_12" method*mix 0 1 0 0;
estimate "LSM_21" method*mix 0 0 1 0;
estimate "LSM_22" method*mix 0 0 0 1;
estimate "interaction" method*mix 1 -1 -1 1;
estimate "Method Main Effect" method*mix 1 1 -1 -1 / divisor=2;
estimate "Mix Main Effect" method*mix 1 -1 1 -1 / divisor=2;
ods output estimates=link_scale;
title "Negative Binomial GLMM";
run;
The burn-in, posterior sampling and thinning options may look extreme, but computing time is minimal, and they guarantee useable results. The INIT option uses PROC GLIMMIX results as starting values for the METHOD*MIX ( ) effects. Changing from the “constant” to the “normal” prior has little effect with this model.
The COVPRIOR in the RANDOM statement is based on using estimates from PROC GLIMMIX, and , as most likely values. Given that they are similar in value, the same inverse gamma with shape=4 and scale =3 is used for both. The SCALEPRIOR option allows you to specify a prior for the negative binomial scale parameter. The search using the Christiansen, et al. approach described in the previous section resulted in selecting a prior. Note that BGLIMM requires the gamma distribution to be specified using the inverse scale, hence ISCALE=25 rather than SCALE=0.04.
The ESTIMATE statements define means, interaction and main effect differences. These are given on the model (log) scale in the SAS listing. You can use either the %SUMINT macro with the OUTPOST=MODEL_SCALE data set or the ODS OUTPUT data set to compute data-scale equivalents. In the interest of space, only the OUTPOST plus %SUMINT program statements and output are shown.
The SAS statements:
data data_scale;
set model_scale;
Lambda_11=exp(LSM_11);
Lambda_12=exp(LSM_12);
Lambda_21=exp(LSM_21);
Lambda_22=exp(LSM_22);
Method_Diff_Ratio=exp(Method_Main_Effect);
Mix_Diff_Ratio=exp(Mix_Main_Effect);
%sumint(data=data_scale, var=Lambda_11: Lambda_12: Lambda_21: Lambda_22:
interaction: Method_Diff_Ratio: Mix_Diff_Ratio)
Output 7 shows the listing.
Posterior Summaries and Intervals
Parameter
N
Mean
Standard Deviation
95% HPD Interval
Lambda_11
10000
19.2095
18.9908
0.7638
52.9988
Lambda_12
10000
31.6319
31.0823
1.4792
86.1046
Lambda_21
10000
4.2671
3.5069
0.4212
10.6353
Lambda_22
10000
13.5674
11.0600
1.0802
32.3851
interaction
10000
0.6695
0.8461
-0.9995
2.3477
Method_Diff_Ratio
10000
3.6754
2.7677
0.3665
8.3976
Mix_Diff_Ratio
10000
0.4728
0.2049
0.1595
0.8911
Output 7. Data-Scale Results of Negative Binomial Split-Plot GLMM Analysis
The LAMBDA_11, etc. terms give the rate parameter estimates, for the four method-mix combinations. The “interaction” term remains on the model scale. There is no need to express it on the data scale, as its only purpose is to assess the magnitude of the method x mix interaction. The HPD interval includes zero; using interval as the criterion, you could conclude that there is insufficient evidence to reject the hypothesis of no interaction. The METHOD_DIFF_RATIO” term estimates where is the rate parameter for the method averaged over both mixes. The difference = , hence . The “MIX_DIFF_RATIO” is similarly defined, but for mix instead of method. Think of these as data scale main effect measures. Although not shown here, you could also obtain statistics for differences between selected LSM terms, similar to the way PROBDIFF was defined in the previous section.
EXAMPLE 3: REPEATED MEASURES DATA
This example is from Chapter 8 of SAS for Mixed Models (2018). It appears as “Respiratory Data” for a repeated measures example in Littell, Pendergast and Natarijan (2000), and in Chapter 4 of SAS for Linear Models, 4 th Edition (Littell, et al., 2002). The data are from a trial comparing a standard asthma treatment (A), a test drug (C) and a placebo (P). Treatments were randomly assigned to 24 patients (72 patients total participated in the trial). The response variable, FEV1, is a measure of breathing ability. A baseline measurement (BASEFEV1) was taken on each patient, then measurements every hour for eight hours starting at one hour after application. FEV1 can be assumed to follow a Gaussian distribution. Table 8 shows the experiment design, treatment design and combined sources of variation.
EXPERIMENT DESIGN
TREATMENT
COMBINED
SOURCE
DF
SOURCE
DF
SOURCE
DF
drug
2
drug
2
patient
72-1=71
patient(drug)
a.k.a. between
71-2=69
hour
7
hour
7
drug x hour
14
drug x hour
14
occasion(patient)
72(8-1) = 504
occ(patient)|drug
a.k.a. within
504-7-14 = 493
TOTAL
575
TOTAL
575
Table 8. Sources of Variation for Respiratory Repeated Measures Data
In repeated measures terminology, patient(drug) is called the between subjects effect and the occ(patient)|drug (occasion within patient after accounting for drug) residual term is called the within subjects effect. Following SAS for Mixed Models, the LMM of choice has the following elements:
Distributions
, where , the vector of FEV1 measures on the patient assigned to the drug over the hours of observation, is the corresponding mean vector and is the covariance matrix modeling serial correlation among the within subjects effects. SAS for Mixed Models uses the AR(1) covariance model.
is the between subjects effect
Linear Predictor: , where and denote drug and time (hour) effects, respectively.
The covariance parameters for the for the matrix are for autocorrelation and for residual (within subjects) variance. You can include the baseline covariance, BASEFEV1, by modifying the linear predictor to , where denotes BASEFEV1 for the patient assigned to the drug.
Figure 4 shows the interaction plot, which you can obtain using the PLOT=MEANPLOT option in the PROC GLIMMIX LSMEANS statement. The interaction plot is instructive, because it helps you visualize the objectives of this study and the requirements of the model needed to address the objectives.
Figure 4. Interaction Plot showing Change in FEV1 over hours by treatment.
The treatment is initially effective for the A and C drugs but not the placebo. Over time, FEV1 decreases as the drugs’ effectiveness wears off. The objective of the study is to see if there is a difference between the treatments as measured by initial effectiveness and by how long the effect lasts.
There are two ways to approach this objective. One is to model HOUR as a CLASS variable and track the simple effects and at each hour , where . The other is to redefine in terms of an unequal slopes linear regression model, i.e., , where denotes the hour, and compare intercept and slope coefficients.
NOTE: The documentation for PROC BGLIMM uses this data set as an example. In one passage, the documentation gives what it calls “Model 2” and “Model 3.” The MODEL statements for each are:
Model 2: fev1=basefev1 drug hour;
Model 3: fev1=basefev1 drug|hour;
The documentation characterizes the difference between model 2 and model 3 as “minor,” and shows results for model 2 but not model 3. Perhaps the difference is “minor” from a programming point of view, but the difference is not “minor” from a statistical practice point of view. Model 2 is incapable of addressing the objectives of this study and therefore fails the “sensible model” criterion. To address this study’s objectives, the drug x hour interaction must be included in the model in some form.
The PROC BGLIMM statements for simple effects and unequal slopes approaches are given as follows. For the simple effects approach, it is much more convenient to specify the cell-means form of the model. The statements are:
proc bglimm data=fev1uni nbi=1000 nmc=50000 thin=5 seed=97816352
plots=(trace autocorr density) diagnostics=all;
class Drug Patient Hour;
model FEV1 = BaseFev1 Drug*Hour;
random int / subject=Patient(drug);
repeated Hour / subject=Patient(Drug) type=ar(1) r rcorr;
run;
Notice that PROC BGLIMM barrows the syntax of the PROC MIXED REPEATED statement to specify the covariance model for the within subjects effect. The default priors work well for this model, but increasing burn-in, posterior sampling and thinning improves the results. There are twenty-four DRUG*HOUR least squares means, and sixteen simple effects comparing drug A and C to the placebo at each hour, for a total to forty ESTIMATE statements. In the interest of space, only one example of each type of statement is shown.
You can access the SAS files for all examples in this tutorial at ... (I need to find out from SAS where...)
The example ESTIMATE statements are:
estimate "LSM A at hour 1" intercept 1 basefev1 2.6493
Drug*Hour 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
estimate "A vs P at hour 1"
Drug*Hour 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0;
The coefficients for least squares means (LSM) obtain the adjusted mean at the average BASEFEV1 covariate value of 2.6493. If you are in doubt, use the E option of the PROC GLIMMIX LSMEANS statement to get the coefficients needed for these statements.
The statements for the unequal slopes approach are as follows:
data fev1uni;
set fev1uni;
H=hour;
proc bglimm data=fev1uni nbi=1000 nmc=50000 thin=5 seed=44672057
plots=(trace autocorr density) diagnostics=all;
class Drug Patient Hour;
model FEV1 = BaseFev1 drug H(drug);
random int / subject=Patient(drug);
repeated Hour / subject=Patient(Drug) type=ar(1) r rcorr;
estimate "intercept - Drug A" intercept 1 basefev1 2.6493 drug 1 0 0;
estimate "intercept - Drug C" intercept 1 basefev1 2.6493 drug 0 1 0;
estimate "intercept - Drug P" intercept 1 basefev1 2.6493 drug 0 0 1;
estimate "slope - Drug A" H(drug) 1 0 0;
estimate "slope - Drug C" H(drug) 0 1 0;
estimate "slope - Drug P" H(drug) 0 0 1;
estimate "intercept A vs C" drug 1 -1 0;
estimate "intercept A vs P" drug 1 0 -1;
estimate "intercept C vs P" drug 0 1 -1;
estimate "slope A vs C" H(drug) 1 -1 0;
estimate "slope A vs P" H(drug) 1 0 -1;
estimate "slope C vs P" H(drug) 0 1 -1;
run;
You need to define a second variable for HOUR – in this case the DATA step with H=HOUR – and use one as the direct regression variable in the MODEL statement and the other as a CLASS variable for the REPEATED statement. The “INTERCEPT – DRUG A” etc. ESTIMATE statements obtain the estimated regression intercepts, adjusted for the baseline covariate, and the “SLOPE – DRUG A” etc. statements obtain , the slopes. The next six statements compare intercepts and slopes among the DRUG treatments.
Output 8 shows the covariance parameter estimates obtained using the cell means model. Output 9 shows the least squares means listing from the cell means model. Output 10 shows the simple effect difference results.
Posterior Summaries and Intervals
Parameter
N
Mean
Standard Deviation
95% HPD Interval
Residual Var
10000
0.0846
0.0104
0.0664
0.1058
Residual AR(1)
10000
0.5386
0.0550
0.4278
0.6407
Random Var
10000
0.2458
0.0453
0.1612
0.3341
Output 8. Covariance parameter estimates.
“Residual Var” gives the within subjects variance statistics. “Mean” gives you . “Residual AR(1)” give the autocorrelation parameter, . “Random Var” gives the between subjects variance, .
Results from ESTIMATE Statements
Label
Mean
Standard Deviation
95% HPD Interval
LSM A at hour 1
3.4739
0.1178
3.2501
3.7107
LSM C at hour 1
3.6903
0.1187
3.4705
3.9330
LSM P at hour 1
2.8264
0.1183
2.5956
3.0589
LSM A at hour 2
3.3963
0.1175
3.1685
3.6287
LSM C at hour 2
3.6251
0.1194
3.3849
3.8521
LSM P at hour 2
2.8916
0.1176
2.6606
3.1195
LSM A at hour 3
3.1839
0.1181
2.9470
3.4114
LSM C at hour 3
3.5764
0.1190
3.3390
3.8059
LSM P at hour 3
2.8974
0.1186
2.6571
3.1239
LSM A at hour 4
3.0460
0.1175
2.8098
3.2714
LSM C at hour 4
3.4424
0.1190
3.2178
3.6824
LSM P at hour 4
2.8713
0.1180
2.6237
3.0877
LSM A at hour 5
3.0533
0.1185
2.8278
3.2873
LSM C at hour 5
3.2487
0.1187
3.0172
3.4829
LSM P at hour 5
2.7686
0.1183
2.5390
2.9997
LSM A at hour 6
2.9800
0.1183
2.7533
3.2130
LSM C at hour 6
3.0845
0.1186
2.8482
3.3096
LSM P at hour 6
2.8160
0.1179
2.5817
3.0423
LSM A at hour 7
2.8687
0.1181
2.6414
3.0987
LSM C at hour 7
2.9764
0.1181
2.7273
3.1924
LSM P at hour 7
2.7859
0.1180
2.5595
3.0183
LSM A at hour 8
2.8581
0.1171
2.6377
3.0913
LSM C at hour 8
3.0093
0.1175
2.7724
3.2359
LSM P at hour 8
2.7330
0.1188
2.4999
2.9644
Output 9. Drug x Treatment Least Squares Means for Repeated Measures Data
Results from ESTIMATE Statements
Label
Mean
Standard Deviation
95% HPD Interval
A vs P at hour 1
0.6475
0.1652
0.3224
0.9630
C vs P at hour 1
0.8639
0.1683
0.5235
1.1894
A vs P at hour 2
0.5048
0.1648
0.1606
0.8070
C vs P at hour 2
0.7335
0.1678
0.4078
1.0694
A vs P at hour 3
0.2865
0.1651
-0.0478
0.6028
C vs P at hour 3
0.6789
0.1687
0.3453
1.0093
A vs P at hour 4
0.1747
0.1649
-0.1565
0.4920
C vs P at hour 4
0.5711
0.1680
0.2481
0.9090
A vs P at hour 5
0.2847
0.1658
-0.0489
0.6031
C vs P at hour 5
0.4801
0.1677
0.1611
0.8200
A vs P at hour 6
0.1641
0.1651
-0.1476
0.4993
C vs P at hour 6
0.2685
0.1669
-0.0668
0.5785
A vs P at hour 7
0.0828
0.1654
-0.2411
0.4083
C vs P at hour 7
0.1905
0.1673
-0.1365
0.5220
A vs P at hour 8
0.1251
0.1659
-0.1932
0.4553
C vs P at hour 8
0.2763
0.1667
-0.0641
0.5925
Output 10. Simple effect differences between drugs (A, C) and placebo (P)
Output 10 is the PROC BGLIMM analog to the results you would get with the SLICEDIFF=HOUR option in PROC GLIMMIX.
Output 11 gives the ESTIMATE statement results from the unequal slopes regression model.
Results from ESTIMATE Statements
Label
Mean
Standard Deviation
95% HPD Interval
intercept - Drug A
3.5169
0.1205
3.2852
3.7577
intercept - Drug C
3.8065
0.1197
3.5730
4.0403
intercept - Drug P
2.8859
0.1199
2.6516
3.1233
slope - Drug A
-0.0887
0.0117
-0.1115
-0.0651
slope - Drug C
-0.1057
0.0117
-0.1285
-0.0825
slope - Drug P
-0.0160
0.0119
-0.0390
0.00708
intercept A vs C
-0.2896
0.1690
-0.6066
0.0506
intercept A vs P
0.6310
0.1706
0.3002
0.9703
intercept C vs P
0.9206
0.1703
0.5897
1.2525
slope A vs C
0.0170
0.0166
-0.0166
0.0486
slope A vs P
-0.0727
0.0168
-0.1057
-0.0399
slope C vs P
-0.0897
0.0167
-0.1236
-0.0578
Output 11. Unequal slopes linear regression repeated measures model estimates and differences
You can see that the estimated intercepts for drug A and C and both greater than for the placebo (3.51 and 3.81 versus 2.89) indicating that initial breathing ability is greater for the two drugs than the placebo. On the other hand, the estimated slopes for drug A and C are and respectively versus for the placebo. The placebo’s HPD credible interval for slope includes zero, whereas the intervals for drugs A and C do not. The placebo’s regression is essentially flat because its initial effect is negligible.
Conclusion
In the SAS system hierarchy, PROC BGLIMM occupies a space between PROC GLIMMIX and PROC MCMC. You can use PROC BGLIMM for any GLMM that PROC GLIMMIX can implement. The advantage of the BGLIMM procedure is that it allows to access to Bayesian estimation and inference using the same CLASS, MODEL and RANDOM syntax as PROC GLIMMIX. The main difference between BGLIMM and GLIMMIX is that because Bayesian analysis depends on complex simulation algorithms, there are more diagnostics that must be monitored and more options that you may need to use in order to get a useable analysis.
PROC MCMC uses syntax borrowed from PROC NLMIXED. This means that it does not have a CLASS statement, making it more tedious to specify models, especially effects models with factorial treatment structures. On the other hand, unlike PROC BGLIMM, you can use PROC MCMC to fit nonlinear models, semiparametric models, and mixed models with non-Gaussian random effects, such as beta-binomial and Poisson-gamma models. PROC MCMC also provides more flexibility in specifying prior distributions. For example, you can use priors centered at the starting value for fixed effects, where BGLIMM limits you to “constant” or zero-centered Gaussian priors.
Finally, when you use PROC BGLIMM with non-Gaussian response variables, these is an additional post-processing step, either using the %SUMINT macro or the ODS OUTPUT data set, in order to get data scale estimates inferential statistics. The lack of an ILINK option means an extra step, but as we see in Examples 1 and 2, it is not a difficult step.
PROC BGLIMM makes Bayesian analysis highly accessible to data analysts who have PROC GLIMMIX experience but are new to the Bayesian world.
References
Beitler, P.J. and J.R. Landis. 1985. “A Mixed-Effects Model for Categorical Data.” Biometrics, 41:991-1000.
Christiansen, R., W. Johnson, A. Branscum and T.E. Hanson. 2011. Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. Boca Raton, FL: CRC Press.
Ferrari, S. L. P., and Cribari-Neto, F. 2004. “Beta Regression for Modelling Rates and Proportions.” Journal of Applied Statistics 31:799–815
Littell, R.C, J. Pendergast and R. Natarajan. 2000. “Modeling Covariance Structure in the Analysis of Repeated Measures Data.” Statistics in Medicine, 19:1793-1819.
Littell, R.C., W.W. Stroup and R.J. Freund. 2002. SAS ® for Linear Models, 4 th edition. Cary, NC: SAS Institute, Inc.
Stroup, W.W. 2013. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. Boca Raton, FL: CRC Press.
Stroup, W.W., G.A. Milliken, E.A. Claassen and R.D. Wolfinger. 2018. SAS ® for Mixed Models: Introduction and Basic Applications. Cary, NC: SAS Institute Inc.
Acknowledgements
Acknowledge Oliver Schabenberger, who developed PROC GLIMMIX and was a valued partner in GLMM’s early days. Fang Chen, who developed PROC BGLIMM and who provided helpful advice about the procedure. George Milliken, Russ Wolfinger and Elizabeth Claassen, my SAS ® for Mixed Models co-authors and esteemed mixed model brain trust.
Recommended Reading
SAS ® for Mixed Models: Introduction and Basic Applications
Generalized Linear Mixed Models: Modern Concepts, Methods and Applications
... View more