Walter Stroup, University of NebraskaLincoln
Over the past two decades, generalized linear models (GLMMs) mixed models for nonnormal data such as proportions, counts, time to event, and so forth have become standard tools for statistical analysis. Since its introduction, PROC GLIMMIX has been the primary GLMM procedure for SAS/STAT(r). The most recent edition of SAS(r) for Mixed Models includes three chapters on using GLIMMIX for GLMMs. PROC GLIMMIX is an excellent frequentist tool. However, Bayesian approaches are becoming increasingly important. Many academic journals prefer some even require Bayesian analysis. Even when not required, Bayesian methods allow us to use what we know prior to, or in the early stages of, an investigation. PROC BGLIMM is a new SAS/STAT procedure that makes Bayesian implementation of GLMMs relatively easy. BGLIMM uses syntax similar to PROC GLIMMIX, but there are some differences. This tutorial presents what you need to know to get started using PROC BGLIMM. We will use GLMM examples from SAS for Mixed Models, but with a Bayesian twist.
Watch Bayesian Analysis of GLMMs Using PROC BGLIMM as presented by the author on the SAS Users YouTube channel.
Note to readers: the following is as much of the proceedings paper as I was able to upload. The SAS programs, output and interpretations are here in more of less readable form. Equations did not upload, so you will see blanks where those are supposed to be. You can download the PDF file (Stroup1146.pdf), which has the paper as I wrote it in intact form, or the presentation slides (140530_SASGF2021_BLGLIMM_PDF_version.pdf)
Mixed models are important tools for analyzing data from many types of studies, including longitudinal or repeated measures, multilevel or splitplot experiments, blocked designs with incomplete blocks or missing data and multilocation studies. Linear mixed models (LMMs) accommodate response variables assumed to follow a normal (hereafter referred to as a Gaussian) distribution. Generalized linear mixed models (GLMMs) extend mixed model theory and methods to accommodate nonGaussian responses such as categorical or count data. Because the LMM is a special case of the GLMM – a GLMM with Gaussian data – the acronym GLMM is used for the rest of this paper. In the SAS^{®} system, PROC GLIMMIX is the preeminent mixed model procedure, allowing users to work with both LMMs and GLMMs. PROC GLIMMIX uses a frequentist approach to estimation and inference. The next section, “GLMM Basics” gives an overview of what this entails, but the basic idea is that frequentist statistics focus on obtaining estimates and standard errors that are used to construct test statistics for significance testing or to construct confidence intervals.
Bayesian statistics provide an alternative approach. Bayesian statistics focus on incorporating prior information to obtain posterior distributions of the statistics of interest. Posterior distributions combine what you know in advance – the prior – with the data you observe. Bayesian statistics are becoming increasingly important for data analysis. One reason is that many academic journals now discourage classical significance testing in favor of Bayesian analysis. Some journals have even banned significance tests, in effect requiring Bayesian analysis. The second important reason is that the statistics obtained from frequentist methods are essentially equivalent to Bayesian analysis with a “noninformative” prior. In other words, frequentist methods implicitly assume that you know nothing until you analyze the data. In reality, this is rarely, if ever, true (if you really know nothing before analyzing the data, you probably should not be doing the study!). Many data sets come from studies that may be part of a series – phases of clinical trials leading to the licensing of a pharmaceutical product – or similar experiments by graduate students with the same advisor.
PROC MCMC is the SAS system’s original and allpurpose Bayesian procedure. Although you can use PROC MCMC for mixed models, the syntax is similar to PROC NLMIXED, which can be inconvenient for lessthansimple models or models with CLASS variables. PROC BGLIMM was developed to allow you to use syntax similar to PROC GLIMMIX, making Bayesian analysis more accessible for GLMMs.
This tutorial introduces PROC BGLIMM using three examples from SAS for Mixed Models: Introduction and Basic Applications (Stroup, et al. 2018).
The next section called “GLMM Basics,” covers the GLMM setting, essential definitions and terminology, and an overview and comparison of PROC GLIMMIX and PROC BGLIMM estimation and inference.
The classical format for a statistical model is the equation
Response variable = systematic component + random component
Think of the systematic component as the fixed effects, e.g., in ANOVAtype models with treatment effects or in linear regression models. In mixed models, the random component is really two distinct components, random model effects such as block effects, and residual effects, such as those that account for serial covariance in a longitudinal mixed model. The classical response = fixed + random format works well if you can assume that the data follow a Gaussian distribution, but in many cases this is not true. Table 1 shows common types of response variables in the lefthand column and types of model effects across the top row.
Response Variable 
Fixed 
Random 

Categorical (CLASS) 
Continuous 
Model effect 
Residual / Covariance structure 

Gaussian 



“Rside” 
Categorical binomial multinomial 



“Gside” 
Continuous proportion beta 




Count Poisson negative binomial 




timetoevent 




etc... 



Table 1. Response variable and model effect types covered by LMM and GLMM
Replacing the classical model format, the defining elements of the GLMM are as follows:
PROC MIXED allows you to work in the first row of Table 1. PROCs BGLIMM, GLIMMIX, MCMC and NLMIXED allow you to work with all of the rows.
The guiding principle of GLMM estimation is maximum likelihood (ML). You obtain estimates of the fixed effects by maximizing the loglikelihood, , where = . In general, this integral is intractable.
SAS software uses two approximation strategies: linearization (pseudolikelihood) and integral approximation (quadrature and Laplace). The mixed model equations for Gaussian data, the PROC MIXED default used to obtain REML estimates of the variance components and ML solutions for and , are special cases of pseudolikelihood (PL). The REML version of PL is the PROC GLIMMIX default. Laplace and quadrature are PROC GLIMMIX options. PROC NLMIXED uses quadrature only.
Estimates and standard errors computed from PL or integral approximation are used to compute test statistics, pvalues and confidence intervals. Classical frequentist inference makes extensive use of significance testing.
The primary tool of Bayesian inference is the posterior distribution,
𝑓(𝛽, 𝜎𝑦, 𝑏) =
𝑓(𝑦,𝑏𝛽,𝜎)𝑓(𝛽,𝜎)
∬ 𝑓(𝑦,𝑏𝛽,𝜎)𝑓(𝛽,𝜎)𝑑𝛽𝑑𝜎
where 𝜎 denotes the vector of covariance parameters and 𝑓(𝛽, 𝜎) denotes the prior distribution of the fixed effects and covariance parameters. The function 𝑓(𝑦, 𝑏𝛽, 𝜎) is the same likelihood used in frequentist estimation, defined by the GLMM distributions 𝑓(𝑦𝑏) and 𝑓(𝑏).
As with the GLMM likelihood, the integrals required for the posterior distribution are generally intractable. Unlike frequentist estimation, neither linearization nor integral
approximation are viable options. Instead, Bayesian methods approximate the posterior distribution by simulation.
PROC BGLIMM involves the following steps:
The end result is a mean, standard deviation and quantiles of interest for the posterior distribution of each element of 𝛽 and each covariance parameter. You can use BGLIMM’s ESTIMATE statement to define estimable functions of the 𝛽 that address objectives of your data analysis. Two forms of credible intervals are commonly used for Bayesian inference:
There are a number of diagnostics you should check before using PROC BGLIMM results. Diagnostics are introduced in the context of the examples in the following sections. These examples cover problems you are likely to encounter, how to interpret the relevant diagnostics, and PROC BGLIMM options you can use to address these problems in order to obtain a useable analysis.
The first example is the Beitler and Landis (Biometrics, 1985) multiclinic data set that appears in SAS for Mixed Models (2018), Chapter 11, Section 11.3. Two treatments, CNTL and DRUG, are compared at eight clinics sampled from a target population. At the clinic, patients are assigned to the treatment. The response variable, denoted , is the number of patients having a favorable outcome.
The first step in the analysis of these data is to identify an appropriate statistical model. You can do this by following a process presented in SAS for Mixed Models, Chapter 5. List the sources of variation separately for the “study” design and the treatment design. The study design, also called the “experiment design,” describes the components of the design before being assigned treatments or treatment levels. In this case, the study design consists of clinics and groups of patients within clinics. The treatment design is simply CNTL and DRUG. Groups are randomly assigned to treatments. Table 2 shows the sources of variation.
STUDY DESIGN 
TREATMENT DESIGN 
COMBINED 

SOURCE 
DF 
SOURCE 
DF 
SOURCE 
DF 
clinic 
7 


clinic 
7 


treatment 
1 
treatment 
1 
group(clinic) 
8 


group(clinic)  treatment a.k.a. clinic x treatment a.k.a. unit of observation 
81=7 
TOTAL 
15 


TOTAL 
15 
Table 2. Sources of Variation for MultiClinic Example Data
In the combined column, read “group(clinic)  treatment” as “group within clinic after accounting for treatment.” List the rows so that treatment appears in the row immediately above the unit to which it is assigned – in this case, group(clinic).
The next step is to write model effects associated with each source of variation and the assumed probability distribution of any effects considered to be random. Tables 3 and 4 show two scenarios that lead to plausible mixed models. Table 3 describes the elements of a logitnormal GLMM, socalled because it the uses a logit link function, , where denotes the probability of a favorable outcome for the clinictreatment combination, and all random effects are assumed to follow a normal (Gaussian) distribution. Note that clinics effects are random because the clinics in the study are a representative sample. Also note that there is a random effect for the group level source of variation. You should include this term to account for variation in at this level – if you don’t, overdispersion may cause misleading results.
Note that group(clinic) is the unit on which the observations ( ) are taken. Hence, you can refer to it as the “unit level” effect, unit being shorthand for unit of observation. Specifying the model requires stating the assumed distribution of the observations, conditional on the random effects, at the unit level, and relating its expected value to the linear predictor, e.g., through the inverse link function as shown here.
SOURCE 
EFFECT DISTRIBUTION 
OBSERVATION AND MODEL 
clinic 


treatment 


group(clinic)  treatment a.k.a. clinic x treatment a.k.a. unit of observation 
Table 3. Model Effects and Distributions Defining Logitnormal GLMM
Table 4 gives the elements of a betabinomial mixed model, a commonly used alternative to the logitnormal GLMM for binomial data. The difference between the two models is the way grouplevel variation in the probability of a favorable outcome is modeled. The logitnormal GLMM includes a random effect in the linear predictor. The betabinomial models the probability as a random variable, assuming that it follows a beta distribution. Table 4 gives the Ferrari and CribariNeto (2004) parameterization of the beta in terms of its mean ( ) and scale parameter ( ). Alternatively, you can write the distribution in its conventional mathstat form, , where and .
SOURCE 
EFFECT DISTRIBUTION 
OBSERVATION AND MODEL 
clinic 


treatment 


group(clinic)  treatment a.k.a. clinic x treatment a.k.a. unit of observation 

Table 4. Model Effects and Distributions Defining BetaBinomial Mixed Model
You can use PROC BGLIMM (or PROC GLIMMIX) to implement the logitnormal GLMM, but not the betabinomial. This is because BGLIMM and GLIMMIX are limited to models with Gaussian random effects, whereas the betabinomial has a nonGaussian random effect, . (Although beyond the scope of this tutorial, you can implement the betabinomial with PROC MCMC)
The betabinomial is shown here to illustrate the difference between a “sensible” model and an improperly specified model. What constitutes a “sensible” model? You must have a onetoone correspondence between sources of variation and model effects or parameters that account for them. Table 5 illustrates the difference.

LOGITNORMAL 
BETABINOMIAL 
NAIVE GLMM 
BETABIN W/ 
SOURCE 
sensible 
sensible 
OverDispersion 
Unit Confounding 
clinic 

treatment 

unit 

, 
Table 5. Sensible and Improperly Specified Mixed Models for MultiClinic Binomial Data
Both the logitnormal and betabinomial meet the sensible model criterion. The “naive GLMM” lacks any term that accounts for variation at the unit level, making the model vulnerable to overdispersion. The effect is thus needed for the logitnormal model, but if you include it in the betabinomial model, it is confounded with the beta distribution’s scale parameter ( ), which is not good.
The remainder of this section focuses on implementing the logitnormal model with PROC BGLIMM.
PROC BGLIMM uses syntax borrowed from PROC GLIMMIX and, for repeated measures, from PROC MIXED. To illustrate, here are the basic GLIMMIX and BGLIMM statements. First, the GLIMMIX statements for the logitnormal GLMM from SAS for Mixed Models:
proc glimmix data=multi_clinic;
class clinic treatment;
model fav/nij = treatment;
random intercept treatment / subject=clinic;
lsmeans treatments / ilink diff oddsratio cl;
run;
Now the BGLIMM statements:
proc bglimm data=multi_clinic plots=(trace autocorr density)
diagnostics=all outpost=cout;
class clinic treatment;
model fav/nij = treatment / init=pinit;
random intercept treatment / subject=clinic;
estimate “CNTL” intercept 1 treatment 1 0;
estimate “DRUG” intercept 1 treatment 0 1;
estimate “log_odds_ratio” treatment 1 1;
ods output estimates=model_scale;
run;
Notice that the CLASS, MODEL and RANDOM statements for the two procedures are identical. The FAV/NIJ (events/trials) syntax is specific to the binomial distribution. Also notice that there are no statements in the basic PROC BGLIMM program specifying prior distributions. BGLIMM has preprogrammed priors that it uses by default. These may or may not be appropriate, depending on the model and data. The additional options in the PROC BGLIMM statement for PLOT and DIAGNOSTICS should be considered standard operating procedure to either verify that the results are useable or to identify problems that need to be fixed, such as inappropriate default priors, before the output can be regarded as useable. The INIT=PINIT option in the MODEL statement causes the default starting values used for the fixed effects to be included is the SAS listing. Consider these to be part of the diagnostics. PROC BGLIMM does not have an LSMEANS statement. You must write the ESTIMATE statements that correspond to least squares means and treatment differences. PROC BGLIMM also lacks an ILINK option. PROC BGLIMM only computes modelscale (in this case, logit scale) estimates. The OUTPOST option in the PROC statement and the ODS OUTPUT statement give you two ways to obtain datascale statistics. These are explained below in the subsection entitled “PostProcessing to Obtain DataScale Statistics.”
The PLOT, DIAGNOSTICS=ALL and INIT=PINIT produce items in the SAS listing the help you identify problems and to decide if the results are useable. This section has two parts: a list of the items and a brief description of their purpose, and diagnostics produced by the default PROC BGLIMM statements given above and their interpretation.
The PLOT option gives you three plots: autocorrelation, density and trace. Trace plots are also called “caterpillar” plots, because ideally they should look like a “fuzzy caterpillar,” which indicates the sampling algorithm has produced a reasonable approximation of the posterior distribution. The algorithms used to sample the posterior distributions are vulnerable to autocorrelation. The autocorrelation plot shows how quickly autocorrelation decreases as lag increases. The density plot, as the name implies, shows the posterior density produced by the sampling algorithm.
DIAGNOSTICS=ALL produces the following:
The listing also includes the number of burnin iterations, the posterior density sample size and the priors the BGLIMM procedure uses by default.
Figure 1 shows two diagnostic plots. On the left is the plot for the logitnormal GLMM’s intercept parameter produced by BGLIMM program given above. On the right is an example of what the plot should look like in a run whose results are useable. The plots for the other parameters are not shown, but they are similar to these.
Figure 1. Example Diagnostic Plots
The lefthand plot is what you do not want to see. The trace plot shows erratic and inconsistent sampling of the posterior distribution. It looks more like a seismograph during an earthquake than the “fuzzy caterpillar” plot on the right. The righthand plot shows consistent sampling from beginning to end. The autocorrelation plot on the left also signals trouble. Autocorrelation should drop to zero quickly, as it does by lag 10 in the righthand plot. The BGLIMM run with default settings produces autocorrelation that never goes to zero, even at lag 50. If autocorrelation does not drop to zero at least by lag 25, consider the results unusable.
Output 1 shows results produced by the DIAGNOSTICS=ALL option. Statistics that signal trouble are highlighted in bold.
Output 1. Diagnostic Statistics from PROC BGLIMM Logitnormal Program with Default Settings
Effective Sample Sizes 

Parameter 
ESS 
Autocorrelation 
Efficiency 
Intercept 
174.1 
28.7273 
0.0348 
trt cntl 
301.6 
16.5782 
0.0603 
trt drug 
. 
. 
. 
Random VC(1) 
886.7 
5.6389 
0.1773 
Random VC(2) 
722.3 
6.9225 
0.1445 
Geweke Diagnostics 

Parameter 
z 
Pr > z 
Intercept 
0.4815 
0.6302 
trt cntl 
0.2990 
0.7649 
trt drug 
. 
. 
Random VC(1) 
0.5412 
0.5884 
Random VC(2) 
0.9604 
0.3369 
RafteryLewis Diagnostics 

Quantile=0.025 Accuracy=+/0.005 Probability=0.95 Epsilon=0.001 

Parameter 
Number of Samples 
Dependence 

BurnIn 
Total 
Minimum 

Intercept 
29 
32909 
3746 
8.7851 
trt cntl 
18 
19114 
3746 
5.1025 
trt drug 
. 
. 
. 
. 
Random VC(1) 
4 
4636 
3746 
1.2376 
Random VC(2) 
4 
4714 
3746 
1.2584 
HeidelbergerWelch Diagnostics 

Parameter 
Stationarity Test 
HalfWidth Test 

Cramervon 
pValue 
Test 
Iterations 
HalfWidth 
Mean 
Relative 
Test 

Intercept 
0.0951 
0.6092 
Passed 
0 
0.0875 
0.4288 
0.2041 
Failed 

trt cntl 
0.2343 
0.2098 
Passed 
0 
0.0630 
0.9815 
0.0641 
Passed 

trt drug 
. 
. 

. 
. 
. 
. 


Random VC(1) 
0.2008 
0.2659 
Passed 
2000 
0.0655 
1.8446 
0.0355 
Passed 

Random VC(2) 
0.0432 
0.9160 
Passed 
0 
0.0855 
1.2928 
0.0661 
Passed 

The “Effective Sample Sizes” table ESS and “Efficiency” results indicate trouble. All ESS values are less than 1000. The Geweke statistics look okay here, but you should always check them. The RafteryLewis dependence factors for “Random VC(1)” and “VC(2),” the clinic and clinic x treatment variance ( and ) respectively, are acceptably close to one, but those for the intercept and “trt cntl” effects ( and ) are much too high. Rules of thumb being at best approximate, if you see dependence factors greater than 3 or 4, consider your results to be unusable. Finally, the HeidelbergerWelch halfwidth test for intercept failed.
The problems identified by the above diagnostics can occur for several reasons. The number of burnin iterations or the size of the posterior distribution sampling may be inadequate. You can use thinning, that is, only taking every 5^{th} or 10^{th} or even 50^{th} posterior density sample instead of every sample, to reduce autocorrelation. The default starting values for the fixed effect parameters may not be appropriate. The default priors may be too diffuse, including values that are technically in the parameter space, but highly implausible. Or the default priors may be where the parameters are not, that is, the most likely values of the parameter may be in an extreme tail of the prior distribution. Output 2 shows the defaults used by PROC BGLIMM for the logitnormal GLMM.
Output 2. Sampling, Thinning, Starting Values and Priors for Default Logitnormal BGLIMM Run
Model Information 

BurnIn Size 
500 
Simulation Size 
5000 
Thinning 
1 
Initial Values for Fixed Effects 

Parameter 
Value 
Intercept 
0.3102 
trt cntl 
0.4040 
Priors for Fixed Effects 

Parameter 
Prior 
Intercept 
Constant 
trt cntl 
Constant 
Priors for Scale and Covariance Parameters 

Parameter 
Prior 
Random Cov (Diag) 
Inverse Gamma (Shape=2, Scale=2) 
The “Model Information” table shows the default burnin size (number of burnin iterations), the “simulation size” (number of samples of the posterior density) and thinning. Thinning equal one means no thinning was done. The “Initial Values for Fixed Effects” table shows the starting values for and . Compare these to their estimates from PROC GLIMMIX, and . You can use the estimates from GLIMMIX as starting values instead of letting BGLIMM’s algorithm choose starting values.
The default priors for and are “constant” – essentially uniform distributions between plus and minus infinity. Overly diffuse priors that include highly implausible parameter values can cause problems – given that the actual values of and are unlikely to be very far from their GLIMMIX estimates, the “constant” prior may be an issue. The prior for both variance components is an inverse gamma with shape and scale parameters equal to two. Figure 2 shows a plot of the inverse gamma(2,2) distribution with vertical lines showing the GLIMMIX estimates of and .
Figure 2. Plot of PROC BGLIMM Default Prior for Variance Components
The lower vertical bar is the GLIMMIX estimate of the clinic x treatment variance, . The upper vertical bar is the GLIMMIX estimate . You can see that the most likely value of the clinic x treatment variance is in the extreme lower tail of the default prior. Most of the probability mass of the inverse gamma(2,2) distribution is located over values that are implausibly high for . The clinic variance shows the opposite problem, although to a lesser extent – its most likely value is in the upper tail of the default prior.
Christiansen, et al. (2011) suggest a strategy for selecting an appropriate prior for a variance component. They suggest using precision, defined as the inverse of the variance, denoted here as , and identifying a gamma distribution whose mode is approximately equal to the most likely value of the parameter being estimated (you can use the estimates from PROC GLIMMIX as a logical most likely value) and whose upper and lower extreme quantiles (say the 1^{st} and 99^{th}) correspond to values below and above which are thought to be highly unlikely. The mode of a gamma distribution with shape parameter and scale parameter is . You can specify the mode, try several values of , solve for and evaluate the resulting distribution.
The following is an example a program you can use for prior hunting. The example is for the clinic x treatment variance. The PROC GLIMMIX estimate using the pseudolikelihood default is . The estimate using GLIMMIX’s METHOD=QUADRATURE option is . These translate to precisions of 17 and 100 respectively. The program uses a mode between the two, in this example 40. Use the PDF and CDF functions to create the distribution for each combination. The PROC PRINT statement allows you to examine the lower and upper quantiles of the candidate distributions. The PROC GPLOT statement allows you to visualize the candidate distributions.
The program statements are as follows:
data gamma_prior;
mode=40;
/* A=shape parameter */
/* B=scale parameter */
do a=1.01,1.05,1.1,1.25,1.5,2,3,5,10,20;
b=mode/(a1);
do precision=0 to 200;
pdf_gamma=pdf("gamma",precision,a,b);
cdf_gamma=cdf("gamma",precision,a,b);
output;
end;
end;
proc sort data=gamma_prior; by a;
proc print;
where (0.009<cdf_gamma<0.011 or 0.989<cdf_gamma<0.991);
run;
proc gplot data=gamma_prior;
by a;
plot pdf_gamma*precision/href=17,40,100;
plot cdf_gamma*precision/vref=0.01,0.99;
run;
Some trial and error, and art along with science, are involved at this point. The distribution selected is gamma with shape=3 and scale=20. Figure 3 shows plots of the p.d.f. and c.d.f.
Figure 3. Plots of the Prior Distribution Selected for the Clinic x Treatment Variance
You can see that p.d.f. has a mode of 40 and contains the precision values 17 and 100 well within its range. The c.d.f. plot shows that the lower and upper extreme quantiles are just under 10 and just under 200, respectively. Translated to the variance scale, this prior covers a range between to just over 0.1, the range considered plausible. Using the same strategy, a suitable prior for the clinic precision is gamma with shape=1.5 and scale=1.
The following shows the PROC BGLIMM program modified to include the options needed to obtain useable results. These options appear in bold. The program:
proc bglimm data=clinics seed=81152097
nbi=2500 nmc=100000 thin=10
plots=(trace autocorr density) diagnostics=all
statistics(percent=(2.5 50 97.5))
outpost=cout;
class clinic trt;
model fav/Nij=trt /
init=(list=(0.46 0.75) pinit) coeffprior=normal(var=9);
random intercept / subject=clinic covprior=igamma(shape=1.5, scale=1);
random trt / subject=clinic covprior=igamma(shape=3, scale=0.05);
estimate "CNTL" intercept 1 trt 1 0;
estimate "DRUG" intercept 1 trt 0 1;
estimate "log oddsratio" trt 1 1;
ods output estimates=modelscale;
title "BGLIMM  all needed options";
run;
The NBI, NMC and THIN options increase burnin iterations, posterior distribution sample size and thinning, respectively. The INIT=(LIST=( option allows you to specify starting values for the fixed effect parameters and . The COEFFPRIOR option specifies a prior distribution for these effects. Note that your options for COEFFPRIOR are either “constant” or “normal.” VAR allows you to specify the variance, but the normal prior is always centered at a mean of zero. The COVPRIOR option specifies a prior for the variance of random model effects. Notice that you can have multiple RANDOM statements, which allows you to use different priors for each variance. The COVPRIOR must be specified in terms of the inverse gamma distribution. The required syntax is IGAMMA (SHAPE= , SCALE= ). Use the following result: is equivalent to . For example, the prior for the clinic x treatment precision translates to an prior for the clinic x treatment variance. Output 3 shows selected results.
Effective Sample Sizes 

Parameter 
ESS 
Autocorrelation 
Efficiency 
Intercept 
2952.3 
3.3872 
0.2952 
trt cntl 
10000.0 
1.0000 
1.0000 
trt drug 
. 
. 
. 
Random1 Var 
7644.9 
1.3081 
0.7645 
Random2 Var 
8756.1 
1.1421 
0.8756 
RafteryLewis Diagnostics 

Quantile=0.025 Accuracy=+/0.005 Probability=0.95 Epsilon=0.001 

Parameter 
Number of Samples 
Dependence 

BurnIn 
Total 
Minimum 

Intercept 
6 
6350 
3746 
1.6951 
trt cntl 
2 
3834 
3746 
1.0235 
trt drug 
. 
. 
. 
. 
Random1 Var 
2 
3803 
3746 
1.0152 
Random2 Var 
2 
3650 
3746 
0.9744 
Posterior Summaries and Intervals 

Parameter 
N 
Mean 
Standard 
95% HPD Interval 

Intercept 
10000 
0.4491 
0.5504 
1.6003 
0.5728 
trt cntl 
10000 
0.7454 
0.3053 
1.3201 
0.1303 
trt drug 
0 
. 
. 
. 
. 
Random1 Var 
10000 
2.1165 
1.5208 
0.3744 
4.6572 
Random2 Var 
10000 
0.0261 
0.0281 
0.00444 
0.0639 
Posterior Summaries 

Parameter 
N 
Mean 
Standard 
Percentiles 

2.5 
50 
97.5 

Intercept 
10000 
0.4491 
0.5504 
1.5574 
0.4418 
0.6304 
trt cntl 
10000 
0.7454 
0.3053 
1.3539 
0.7410 
0.1619 
trt drug 
0 
. 
. 
. 
. 
. 
Random1 Var 
10000 
2.1165 
1.5208 
0.6293 
1.7386 
5.7644 
Random2 Var 
10000 
0.0261 
0.0281 
0.00701 
0.0193 
0.0845 
Results from ESTIMATE Statements 

Label 
Mean 
Standard 
95% HPD Interval 

CNTL 
1.1946 
0.5612 
2.3136 
0.0881 
DRUG 
0.4491 
0.5504 
1.6003 
0.5728 
log oddsratio 
0.7454 
0.3053 
1.3201 
0.1303 
Output 3. Selected Results from Useable PROC BGLIMM Run for BeitlerLandis Data
The “Effective Samples Sizes” table shows acceptable ESS and efficiency numbers. All RafteryLewis dependence factors are <2. In the interest of space, the other diagnostics are not shown here, but they are similarly acceptable, indicating that this run has produced useable results. The “Posterior Summaries and Intervals” table shows the mean, standard deviation and 95% highest posterior density (HPD) intervals for the model fixed effects and the variance components. The estimates in the “Mean” column are similar to those obtained with PROC GLIMMIX. Note that the fixed effect component of the linear predictor, is not full rank. PROC BGLIMM uses the same convention – setting the last effect, in this case , to zero – as other SAS linear model procedures. The “Posterior Summaries” table gives percentiles specified by the STATISTICS(PERCENT= option instead of HPD intervals. The “Results from ESTIMATE Statements” table give the posterior distribution mean, standard deviation and 95% HPD intervals from the ESTIMATE statements.
Note that all of these statistics are on the model scale – in this case the logit scale. Because PROC BGLIMM has no ILINK option, you must output these results and use postprocessing steps to obtain data scale estimates. The most obvious of these are the probabilities of a favorable outcome for each treatment, and , the difference between the two, , and the oddsratio, . There are two ways to do this. These are shown in the next section.
You can either use the data set produced by the OUTPOST option in the PROC statement or the ESTIMATES from the ODS OUTPUT statement to obtain datascale estimates.
After you run the PROC BGLIMM program that produces useable output, use the following statements:
data datasc;
set cout;
pr_cntl=logistic(cntl);
pr_drug=logistic(drug);
ProbDiff=logistic(drug)logistic(cntl);
OddsRatio=exp(log_odds_ratio);
run;
%sumint(data=datasc, var=pr_cntl: pr_drug: ProbDiff: OddsRatio)
The OUTPOST option creates a data set of posterior samples for the model fixed effects and ESTIMATES. In this example, the data set is called COUT. The DATA step creates a new data set, DATASC, with the favorable outcome probabilities, labelled PR_CNTL and PR_DRUG, their difference, PROBDIFF, and the oddsratio. The LOGISTIC function implements the logit’s inverse link, . The difference estimates the log of the oddsratio, so gives you the estimated oddsratio. The %SUMIT macro computes and prints inferential statistics for the terms listed after VAR= for the data set given by DATA=<data set name>. Output 4 shows the results.
Posterior Summaries and Intervals 

Parameter 
N 
Mean 
Standard 
95% HPD Interval 

OddsRatio 
10000 
0.4970 
0.1532 
0.2251 
0.8004 
pr_cntl 
10000 
0.2461 
0.0990 
0.0750 
0.4475 
pr_drug 
10000 
0.3965 
0.1224 
0.1679 
0.6394 
ProbDiff 
10000 
0.1504 
0.0667 
0.0248 
0.2800 
Output 4. Data Scale Estimates from OUTPOST and %SUMINT
Instead of using the OUTPOST data set and the %SUMINT macro, you can use the following statements:
data lsm; set modelscale;
if label='CNTL' or label='DRUG';
prob=1/(1+exp(Mean));
lower=1/(1+exp(HPDLower));
upper=1/(1+exp(HPDUpper));
data oddsratio; set modelscale;
if label='log odds ratio';
OddsRatio=exp(Mean);
lower=exp(HPDLower);
upper=exp(HPDUpper);
proc print data=lsm;
proc print data=oddsratio;
run;
MODELSCALE is the data set produced by the ODS OUTPUT statement. These DATA steps produce two data sets. The one called LSM uses the logit’s inverse link to compute the estimates and HPD interval lower and bounds for and . The data set called ODDSRATIO computes the estimates and HPD bounds for the oddsratio. Use the IF LABEL= statement to select items to be included in each data set. The LABEL names are the labels used in the ESTIMATE statements. Both data sets use the results of the ESTIMATE statements instead of the posterior sample data set, so the results differ from the %SUMINT listing above. If you want the estimated difference between the probabilities, use OUTPOST and %SUMINT – it is more convenient. Output 5 shows the results using the ODS OUTPUT approach.
Obs 
Label 
Mean 
StdDev 
HPDLower 
HPDUpper 
prob 
lower 
upper 
1 
CNTL 
1.1946 
0.5612 
2.3136 
0.0881 
0.23245 
0.09001 
0.47798 
2 
DRUG 
0.4491 
0.5504 
1.6003 
0.5728 
0.38957 
0.16793 
0.63942 
Obs 
Label 
Mean 
StdDev 
HPDLower 
HPDUpper 
OddsRatio 
lower 
upper 
1 
log oddsratio 
0.7454 
0.3053 
1.3201 
0.1303 
0.47452 
0.26712 
0.87780 
Output 5. Data Scale Estimates from ODS OUTPUT and Followup DATA Steps
The Mean, StdDev, HPDLower and HPDUpper values are model scale values from the BGLIMM ESTIMATE statements. The three columns on the right, labelled “prob” or “OddsRatio” and “lower” and “upper” are the data scale estimates.
This example appears in SAS for Mixed Models (2018) in Chapter 13, Section 13.3. The data are from an agricultural experiment comparing two cultivation methods and two seed mixes. Each of six fields is divided into two sections, called whole plots. Each whole plot is randomly assigned to a method so that both methods appear at each field. Each whole plot is divided into two split plots to which mixes are randomly assigned. Table 6 shows the sources of variation.
EXPERIMENT DESIGN 
TREATMENT DESIGN 
COMBINED 

SOURCE 
DF 
SOURCE 
DF 
SOURCE 
DF 
field 
5 


field 
5 


method 
1 
method 
1 
wholeplot(field) 
6 


field x method (wp) 
61=5 


mix 
1 
mix 
1 


method x mix 
1 
method x mix 
1 
splitplot(wp) 
12 


sp (wp)  method, mix 
122=10 
TOTAL 
23 


TOTAL 
23 
Table 6. Sources of Variation for MultiLevel Field Trial
The response variable is a discrete count. SAS for Mixed Models describes two “sensible” models, the Poissonnormal GLMM and the negative binomial GLMM. This section presents the latter, as it is the model of choice for these data, and the negative binomial illustrates PROC BGLIMM options that would not be used with the Poissonnormal model.
Table 7 uses the combined sources of variation column from Table 6 to show how the negative binomial arises in the context of this experiment.
SOURCE 
EFFECT DISTRIBUTION 
OBSERVATION AND MODEL 
field 


method 


field x method (wp) 


mix 


method x mix 


sp(wp)method,mix a.k.a. unit of obs. 
Table 7. Model Effects and Distribution Defining PoissonGamma Process
The method and mix effects are denoted by and , respectively. The negative binomial arises from a Poissongamma process. Variation at the unit of observation level is assumed to follow a gamma distribution with If you integrate out the unit level term, , where NB denotes negative binomial. Thus, the following elements define the GLMM:
With PROC BGLIMM, replacing the effects part of the linear predictor, , with its cell means, full rank equivalent makes it easier to use the ESTIMATE statement to compute terms of interest. The cells means form can be written .
The CLASS, MODEL and RANDOM statements for this model are:
class field method mix;
model count=method*mix / noint distribution=negbin;
random intercept method/ subject=field;
As with Example 1, the default BGLIMM settings give unusable results. In the interest of space, not all diagnostics are shown. Output 6 shows two examples, the Geweke statistics and the plots for the negative binomial scale parameter, .
Geweke Diagnostics 

Parameter 
z 
Pr > z 
method 1*mix 1 
1.8922 
0.0585 
method 1*mix 2 
2.3868 
0.0170 
method 2*mix 1 
0.9584 
0.3379 
method 2*mix 2 
1.8992 
0.0575 
Scale 
2.7547 
0.0059 
Random VC(1) 
1.4784 
0.1393 
Random VC(2) 
0.4486 
0.6537 
Output 6. Example diagnostics: PROC BGLIMM negative binomial GLMM with default settings
Four of the seven parameters on the “Geweke Diagnostics” table show low pvalues. For the scale parameter, p<0.01. The trace and autocorrelation plots are unacceptable. To obtain useable results, you must work through the same set of steps as illustrated in the previous section. The following program statements are the result:
proc bglimm data=sp_counts plots=(trace autocorr density)
nbi=2500 nmc=1000000 thin=100 seed=20210209
diagnostics=all dic outpost=model_scale;
class field method mix;
model count=method*mix / noint distribution=negbin
init=(list=(2.5, 2.9, 0.8, 2.0) Pinit)
scaleprior=gamma(shape=20,iscale=25);
random intercept method/ subject=field
covprior=igamma(shape=4,scale=3);
estimate "LSM_11" method*mix 1 0 0 0;
estimate "LSM_12" method*mix 0 1 0 0;
estimate "LSM_21" method*mix 0 0 1 0;
estimate "LSM_22" method*mix 0 0 0 1;
estimate "interaction" method*mix 1 1 1 1;
estimate "Method Main Effect" method*mix 1 1 1 1 / divisor=2;
estimate "Mix Main Effect" method*mix 1 1 1 1 / divisor=2;
ods output estimates=link_scale;
title "Negative Binomial GLMM";
run;
The burnin, posterior sampling and thinning options may look extreme, but computing time is minimal, and they guarantee useable results. The INIT option uses PROC GLIMMIX results as starting values for the METHOD*MIX ( ) effects. Changing from the “constant” to the “normal” prior has little effect with this model.
The COVPRIOR in the RANDOM statement is based on using estimates from PROC GLIMMIX, and , as most likely values. Given that they are similar in value, the same inverse gamma with shape=4 and scale =3 is used for both. The SCALEPRIOR option allows you to specify a prior for the negative binomial scale parameter. The search using the Christiansen, et al. approach described in the previous section resulted in selecting a prior. Note that BGLIMM requires the gamma distribution to be specified using the inverse scale, hence ISCALE=25 rather than SCALE=0.04.
The ESTIMATE statements define means, interaction and main effect differences. These are given on the model (log) scale in the SAS listing. You can use either the %SUMINT macro with the OUTPOST=MODEL_SCALE data set or the ODS OUTPUT data set to compute datascale equivalents. In the interest of space, only the OUTPOST plus %SUMINT program statements and output are shown.
The SAS statements:
data data_scale;
set model_scale;
Lambda_11=exp(LSM_11);
Lambda_12=exp(LSM_12);
Lambda_21=exp(LSM_21);
Lambda_22=exp(LSM_22);
Method_Diff_Ratio=exp(Method_Main_Effect);
Mix_Diff_Ratio=exp(Mix_Main_Effect);
%sumint(data=data_scale, var=Lambda_11: Lambda_12: Lambda_21: Lambda_22:
interaction: Method_Diff_Ratio: Mix_Diff_Ratio)
Output 7 shows the listing.
Posterior Summaries and Intervals 

Parameter 
N 
Mean 
Standard 
95% HPD Interval 

Lambda_11 
10000 
19.2095 
18.9908 
0.7638 
52.9988 
Lambda_12 
10000 
31.6319 
31.0823 
1.4792 
86.1046 
Lambda_21 
10000 
4.2671 
3.5069 
0.4212 
10.6353 
Lambda_22 
10000 
13.5674 
11.0600 
1.0802 
32.3851 
interaction 
10000 
0.6695 
0.8461 
0.9995 
2.3477 
Method_Diff_Ratio 
10000 
3.6754 
2.7677 
0.3665 
8.3976 
Mix_Diff_Ratio 
10000 
0.4728 
0.2049 
0.1595 
0.8911 
Output 7. DataScale Results of Negative Binomial SplitPlot GLMM Analysis
The LAMBDA_11, etc. terms give the rate parameter estimates, for the four methodmix combinations. The “interaction” term remains on the model scale. There is no need to express it on the data scale, as its only purpose is to assess the magnitude of the method x mix interaction. The HPD interval includes zero; using interval as the criterion, you could conclude that there is insufficient evidence to reject the hypothesis of no interaction. The METHOD_DIFF_RATIO” term estimates where is the rate parameter for the method averaged over both mixes. The difference = , hence . The “MIX_DIFF_RATIO” is similarly defined, but for mix instead of method. Think of these as data scale main effect measures. Although not shown here, you could also obtain statistics for differences between selected LSM terms, similar to the way PROBDIFF was defined in the previous section.
This example is from Chapter 8 of SAS for Mixed Models (2018). It appears as “Respiratory Data” for a repeated measures example in Littell, Pendergast and Natarijan (2000), and in Chapter 4 of SAS for Linear Models, 4^{th} Edition (Littell, et al., 2002). The data are from a trial comparing a standard asthma treatment (A), a test drug (C) and a placebo (P). Treatments were randomly assigned to 24 patients (72 patients total participated in the trial). The response variable, FEV1, is a measure of breathing ability. A baseline measurement (BASEFEV1) was taken on each patient, then measurements every hour for eight hours starting at one hour after application. FEV1 can be assumed to follow a Gaussian distribution. Table 8 shows the experiment design, treatment design and combined sources of variation.
EXPERIMENT DESIGN 
TREATMENT 
COMBINED 

SOURCE 
DF 
SOURCE 
DF 
SOURCE 
DF 


drug 
2 
drug 
2 
patient 
721=71 


patient(drug) a.k.a. between 
712=69 


hour 
7 
hour 
7 


drug x hour 
14 
drug x hour 
14 
occasion(patient) 
72(81) = 504 


occ(patient)drug a.k.a. within 
504714 = 493 
TOTAL 
575 


TOTAL 
575 
Table 8. Sources of Variation for Respiratory Repeated Measures Data
In repeated measures terminology, patient(drug) is called the between subjects effect and the occ(patient)drug (occasion within patient after accounting for drug) residual term is called the within subjects effect. Following SAS for Mixed Models, the LMM of choice has the following elements:
The covariance parameters for the for the matrix are for autocorrelation and for residual (within subjects) variance. You can include the baseline covariance, BASEFEV1, by modifying the linear predictor to , where denotes BASEFEV1 for the patient assigned to the drug.
Figure 4 shows the interaction plot, which you can obtain using the PLOT=MEANPLOT option in the PROC GLIMMIX LSMEANS statement. The interaction plot is instructive, because it helps you visualize the objectives of this study and the requirements of the model needed to address the objectives.
Figure 4. Interaction Plot showing Change in FEV1 over hours by treatment.
The treatment is initially effective for the A and C drugs but not the placebo. Over time, FEV1 decreases as the drugs’ effectiveness wears off. The objective of the study is to see if there is a difference between the treatments as measured by initial effectiveness and by how long the effect lasts.
There are two ways to approach this objective. One is to model HOUR as a CLASS variable and track the simple effects and at each hour , where . The other is to redefine in terms of an unequal slopes linear regression model, i.e., , where denotes the hour, and compare intercept and slope coefficients.
NOTE: The documentation for PROC BGLIMM uses this data set as an example. In one passage, the documentation gives what it calls “Model 2” and “Model 3.” The MODEL statements for each are:
The documentation characterizes the difference between model 2 and model 3 as “minor,” and shows results for model 2 but not model 3. Perhaps the difference is “minor” from a programming point of view, but the difference is not “minor” from a statistical practice point of view. Model 2 is incapable of addressing the objectives of this study and therefore fails the “sensible model” criterion. To address this study’s objectives, the drug x hour interaction must be included in the model in some form.
The PROC BGLIMM statements for simple effects and unequal slopes approaches are given as follows. For the simple effects approach, it is much more convenient to specify the cellmeans form of the model. The statements are:
proc bglimm data=fev1uni nbi=1000 nmc=50000 thin=5 seed=97816352
plots=(trace autocorr density) diagnostics=all;
class Drug Patient Hour;
model FEV1 = BaseFev1 Drug*Hour;
random int / subject=Patient(drug);
repeated Hour / subject=Patient(Drug) type=ar(1) r rcorr;
run;
Notice that PROC BGLIMM barrows the syntax of the PROC MIXED REPEATED statement to specify the covariance model for the within subjects effect. The default priors work well for this model, but increasing burnin, posterior sampling and thinning improves the results. There are twentyfour DRUG*HOUR least squares means, and sixteen simple effects comparing drug A and C to the placebo at each hour, for a total to forty ESTIMATE statements. In the interest of space, only one example of each type of statement is shown.
You can access the SAS files for all examples in this tutorial at ... (I need to find out from SAS where...)
The example ESTIMATE statements are:
estimate "LSM A at hour 1" intercept 1 basefev1 2.6493
Drug*Hour 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;
estimate "A vs P at hour 1"
Drug*Hour 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0;
The coefficients for least squares means (LSM) obtain the adjusted mean at the average BASEFEV1 covariate value of 2.6493. If you are in doubt, use the E option of the PROC GLIMMIX LSMEANS statement to get the coefficients needed for these statements.
The statements for the unequal slopes approach are as follows:
data fev1uni;
set fev1uni;
H=hour;
proc bglimm data=fev1uni nbi=1000 nmc=50000 thin=5 seed=44672057
plots=(trace autocorr density) diagnostics=all;
class Drug Patient Hour;
model FEV1 = BaseFev1 drug H(drug);
random int / subject=Patient(drug);
repeated Hour / subject=Patient(Drug) type=ar(1) r rcorr;
estimate "intercept  Drug A" intercept 1 basefev1 2.6493 drug 1 0 0;
estimate "intercept  Drug C" intercept 1 basefev1 2.6493 drug 0 1 0;
estimate "intercept  Drug P" intercept 1 basefev1 2.6493 drug 0 0 1;
estimate "slope  Drug A" H(drug) 1 0 0;
estimate "slope  Drug C" H(drug) 0 1 0;
estimate "slope  Drug P" H(drug) 0 0 1;
estimate "intercept A vs C" drug 1 1 0;
estimate "intercept A vs P" drug 1 0 1;
estimate "intercept C vs P" drug 0 1 1;
estimate "slope A vs C" H(drug) 1 1 0;
estimate "slope A vs P" H(drug) 1 0 1;
estimate "slope C vs P" H(drug) 0 1 1;
run;
You need to define a second variable for HOUR – in this case the DATA step with H=HOUR – and use one as the direct regression variable in the MODEL statement and the other as a CLASS variable for the REPEATED statement. The “INTERCEPT – DRUG A” etc. ESTIMATE statements obtain the estimated regression intercepts, adjusted for the baseline covariate, and the “SLOPE – DRUG A” etc. statements obtain , the slopes. The next six statements compare intercepts and slopes among the DRUG treatments.
Output 8 shows the covariance parameter estimates obtained using the cell means model. Output 9 shows the least squares means listing from the cell means model. Output 10 shows the simple effect difference results.
Posterior Summaries and Intervals 

Parameter 
N 
Mean 
Standard 
95% HPD Interval 

Residual Var 
10000 
0.0846 
0.0104 
0.0664 
0.1058 
Residual AR(1) 
10000 
0.5386 
0.0550 
0.4278 
0.6407 
Random Var 
10000 
0.2458 
0.0453 
0.1612 
0.3341 
Output 8. Covariance parameter estimates.
“Residual Var” gives the within subjects variance statistics. “Mean” gives you . “Residual AR(1)” give the autocorrelation parameter, . “Random Var” gives the between subjects variance, .
Results from ESTIMATE Statements 

Label 
Mean 
Standard 
95% HPD Interval 

LSM A at hour 1 
3.4739 
0.1178 
3.2501 
3.7107 
LSM C at hour 1 
3.6903 
0.1187 
3.4705 
3.9330 
LSM P at hour 1 
2.8264 
0.1183 
2.5956 
3.0589 
LSM A at hour 2 
3.3963 
0.1175 
3.1685 
3.6287 
LSM C at hour 2 
3.6251 
0.1194 
3.3849 
3.8521 
LSM P at hour 2 
2.8916 
0.1176 
2.6606 
3.1195 
LSM A at hour 3 
3.1839 
0.1181 
2.9470 
3.4114 
LSM C at hour 3 
3.5764 
0.1190 
3.3390 
3.8059 
LSM P at hour 3 
2.8974 
0.1186 
2.6571 
3.1239 
LSM A at hour 4 
3.0460 
0.1175 
2.8098 
3.2714 
LSM C at hour 4 
3.4424 
0.1190 
3.2178 
3.6824 
LSM P at hour 4 
2.8713 
0.1180 
2.6237 
3.0877 
LSM A at hour 5 
3.0533 
0.1185 
2.8278 
3.2873 
LSM C at hour 5 
3.2487 
0.1187 
3.0172 
3.4829 
LSM P at hour 5 
2.7686 
0.1183 
2.5390 
2.9997 
LSM A at hour 6 
2.9800 
0.1183 
2.7533 
3.2130 
LSM C at hour 6 
3.0845 
0.1186 
2.8482 
3.3096 
LSM P at hour 6 
2.8160 
0.1179 
2.5817 
3.0423 
LSM A at hour 7 
2.8687 
0.1181 
2.6414 
3.0987 
LSM C at hour 7 
2.9764 
0.1181 
2.7273 
3.1924 
LSM P at hour 7 
2.7859 
0.1180 
2.5595 
3.0183 
LSM A at hour 8 
2.8581 
0.1171 
2.6377 
3.0913 
LSM C at hour 8 
3.0093 
0.1175 
2.7724 
3.2359 
LSM P at hour 8 
2.7330 
0.1188 
2.4999 
2.9644 
Output 9. Drug x Treatment Least Squares Means for Repeated Measures Data
Results from ESTIMATE Statements 

Label 
Mean 
Standard 
95% HPD Interval 

A vs P at hour 1 
0.6475 
0.1652 
0.3224 
0.9630 
C vs P at hour 1 
0.8639 
0.1683 
0.5235 
1.1894 
A vs P at hour 2 
0.5048 
0.1648 
0.1606 
0.8070 
C vs P at hour 2 
0.7335 
0.1678 
0.4078 
1.0694 
A vs P at hour 3 
0.2865 
0.1651 
0.0478 
0.6028 
C vs P at hour 3 
0.6789 
0.1687 
0.3453 
1.0093 
A vs P at hour 4 
0.1747 
0.1649 
0.1565 
0.4920 
C vs P at hour 4 
0.5711 
0.1680 
0.2481 
0.9090 
A vs P at hour 5 
0.2847 
0.1658 
0.0489 
0.6031 
C vs P at hour 5 
0.4801 
0.1677 
0.1611 
0.8200 
A vs P at hour 6 
0.1641 
0.1651 
0.1476 
0.4993 
C vs P at hour 6 
0.2685 
0.1669 
0.0668 
0.5785 
A vs P at hour 7 
0.0828 
0.1654 
0.2411 
0.4083 
C vs P at hour 7 
0.1905 
0.1673 
0.1365 
0.5220 
A vs P at hour 8 
0.1251 
0.1659 
0.1932 
0.4553 
C vs P at hour 8 
0.2763 
0.1667 
0.0641 
0.5925 
Output 10. Simple effect differences between drugs (A, C) and placebo (P)
Output 10 is the PROC BGLIMM analog to the results you would get with the SLICEDIFF=HOUR option in PROC GLIMMIX.
Output 11 gives the ESTIMATE statement results from the unequal slopes regression model.
Results from ESTIMATE Statements 

Label 
Mean 
Standard 
95% HPD Interval 

intercept  Drug A 
3.5169 
0.1205 
3.2852 
3.7577 
intercept  Drug C 
3.8065 
0.1197 
3.5730 
4.0403 
intercept  Drug P 
2.8859 
0.1199 
2.6516 
3.1233 
slope  Drug A 
0.0887 
0.0117 
0.1115 
0.0651 
slope  Drug C 
0.1057 
0.0117 
0.1285 
0.0825 
slope  Drug P 
0.0160 
0.0119 
0.0390 
0.00708 
intercept A vs C 
0.2896 
0.1690 
0.6066 
0.0506 
intercept A vs P 
0.6310 
0.1706 
0.3002 
0.9703 
intercept C vs P 
0.9206 
0.1703 
0.5897 
1.2525 
slope A vs C 
0.0170 
0.0166 
0.0166 
0.0486 
slope A vs P 
0.0727 
0.0168 
0.1057 
0.0399 
slope C vs P 
0.0897 
0.0167 
0.1236 
0.0578 
Output 11. Unequal slopes linear regression repeated measures model estimates and differences
You can see that the estimated intercepts for drug A and C and both greater than for the placebo (3.51 and 3.81 versus 2.89) indicating that initial breathing ability is greater for the two drugs than the placebo. On the other hand, the estimated slopes for drug A and C are and respectively versus for the placebo. The placebo’s HPD credible interval for slope includes zero, whereas the intervals for drugs A and C do not. The placebo’s regression is essentially flat because its initial effect is negligible.
In the SAS system hierarchy, PROC BGLIMM occupies a space between PROC GLIMMIX and PROC MCMC. You can use PROC BGLIMM for any GLMM that PROC GLIMMIX can implement. The advantage of the BGLIMM procedure is that it allows to access to Bayesian estimation and inference using the same CLASS, MODEL and RANDOM syntax as PROC GLIMMIX. The main difference between BGLIMM and GLIMMIX is that because Bayesian analysis depends on complex simulation algorithms, there are more diagnostics that must be monitored and more options that you may need to use in order to get a useable analysis.
PROC MCMC uses syntax borrowed from PROC NLMIXED. This means that it does not have a CLASS statement, making it more tedious to specify models, especially effects models with factorial treatment structures. On the other hand, unlike PROC BGLIMM, you can use PROC MCMC to fit nonlinear models, semiparametric models, and mixed models with nonGaussian random effects, such as betabinomial and Poissongamma models. PROC MCMC also provides more flexibility in specifying prior distributions. For example, you can use priors centered at the starting value for fixed effects, where BGLIMM limits you to “constant” or zerocentered Gaussian priors.
Finally, when you use PROC BGLIMM with nonGaussian response variables, these is an additional postprocessing step, either using the %SUMINT macro or the ODS OUTPUT data set, in order to get data scale estimates inferential statistics. The lack of an ILINK option means an extra step, but as we see in Examples 1 and 2, it is not a difficult step.
PROC BGLIMM makes Bayesian analysis highly accessible to data analysts who have PROC GLIMMIX experience but are new to the Bayesian world.
Beitler, P.J. and J.R. Landis. 1985. “A MixedEffects Model for Categorical Data.” Biometrics, 41:9911000.
Christiansen, R., W. Johnson, A. Branscum and T.E. Hanson. 2011. Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. Boca Raton, FL: CRC Press.
Ferrari, S. L. P., and CribariNeto, F. 2004. “Beta Regression for Modelling Rates and Proportions.” Journal of Applied Statistics 31:799–815
Littell, R.C, J. Pendergast and R. Natarajan. 2000. “Modeling Covariance Structure in the Analysis of Repeated Measures Data.” Statistics in Medicine, 19:17931819.
Littell, R.C., W.W. Stroup and R.J. Freund. 2002. SAS^{®} for Linear Models, 4^{th} edition. Cary, NC: SAS Institute, Inc.
Stroup, W.W. 2013. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. Boca Raton, FL: CRC Press.
Stroup, W.W., G.A. Milliken, E.A. Claassen and R.D. Wolfinger. 2018. SAS^{®} for Mixed Models: Introduction and Basic Applications. Cary, NC: SAS Institute Inc.
Acknowledge Oliver Schabenberger, who developed PROC GLIMMIX and was a valued partner in GLMM’s early days. Fang Chen, who developed PROC BGLIMM and who provided helpful advice about the procedure. George Milliken, Russ Wolfinger and Elizabeth Claassen, my SAS^{®} for Mixed Models coauthors and esteemed mixed model brain trust.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.