Bayesian Analysis of GLMMs Using PROC BGLIMM

1 Like

Paper 1146-2021

Author

Walter Stroup, University of Nebraska-Lincoln

Abstract

Over the past two decades, generalized linear models (GLMMs) mixed models for non-normal data such as proportions, counts, time to event, and so forth have become standard tools for statistical analysis. Since its introduction, PROC GLIMMIX has been the primary GLMM procedure for SAS/STAT(r). The most recent edition of SAS(r) for Mixed Models includes three chapters on using GLIMMIX for GLMMs. PROC GLIMMIX is an excellent frequentist tool. However, Bayesian approaches are becoming increasingly important. Many academic journals prefer some even require Bayesian analysis. Even when not required, Bayesian methods allow us to use what we know prior to, or in the early stages of, an investigation. PROC BGLIMM is a new SAS/STAT procedure that makes Bayesian implementation of GLMMs relatively easy. BGLIMM uses syntax similar to PROC GLIMMIX, but there are some differences. This tutorial presents what you need to know to get started using PROC BGLIMM. We will use GLMM examples from SAS for Mixed Models, but with a Bayesian twist.

Watch the presentation

Watch Bayesian Analysis of GLMMs Using PROC BGLIMM as presented by the author on the SAS Users YouTube channel.

Introduction

Note to readers: the following is as much of the proceedings paper as I was able to upload. The SAS programs, output and interpretations are here in more of less readable form. Equations did not upload, so you will see blanks where those are supposed to be. You can download the PDF file (Stroup-1146.pdf), which has the paper as I wrote it in intact form, or the presentation slides (140530_SASGF2021_BLGLIMM_PDF_version.pdf)

Mixed models are important tools for analyzing data from many types of studies, including longitudinal or repeated measures, multi-level or split-plot experiments, blocked designs with incomplete blocks or missing data and multi-location studies. Linear mixed models (LMMs) accommodate response variables assumed to follow a normal (hereafter referred to as a Gaussian) distribution. Generalized linear mixed models (GLMMs) extend mixed model theory and methods to accommodate non-Gaussian responses such as categorical or count data. Because the LMM is a special case of the GLMM – a GLMM with Gaussian data – the acronym GLMM is used for the rest of this paper. In the SAS^® system, PROC GLIMMIX is the preeminent mixed model procedure, allowing users to work with both LMMs and GLMMs. PROC GLIMMIX uses a frequentist approach to estimation and inference. The next section, “GLMM Basics” gives an overview of what this entails, but the basic idea is that frequentist statistics focus on obtaining estimates and standard errors that are used to construct test statistics for significance testing or to construct confidence intervals.

Bayesian statistics provide an alternative approach. Bayesian statistics focus on incorporating prior information to obtain posterior distributions of the statistics of interest. Posterior distributions combine what you know in advance – the prior – with the data you observe. Bayesian statistics are becoming increasingly important for data analysis. One reason is that many academic journals now discourage classical significance testing in favor of Bayesian analysis. Some journals have even banned significance tests, in effect requiring Bayesian analysis. The second important reason is that the statistics obtained from frequentist methods are essentially equivalent to Bayesian analysis with a “non-informative” prior. In other words, frequentist methods implicitly assume that you know nothing until you analyze the data. In reality, this is rarely, if ever, true (if you really know nothing before analyzing the data, you probably should not be doing the study!). Many data sets come from studies that may be part of a series – phases of clinical trials leading to the licensing of a pharmaceutical product – or similar experiments by graduate students with the same advisor.

PROC MCMC is the SAS system’s original and all-purpose Bayesian procedure. Although you can use PROC MCMC for mixed models, the syntax is similar to PROC NLMIXED, which can be inconvenient for less-than-simple models or models with CLASS variables. PROC BGLIMM was developed to allow you to use syntax similar to PROC GLIMMIX, making Bayesian analysis more accessible for GLMMs.

This tutorial introduces PROC BGLIMM using three examples from SAS for Mixed Models: Introduction and Basic Applications (Stroup, et al. 2018).

A multi-clinic trial with binomial data from Chapter 11 of SAS for Mixed Models.
A multi-level (split-plot) experiment with count data from Chapter 13.
Repeated measures (longitudinal) data from Chapter 8.

The next section called “GLMM Basics,” covers the GLMM setting, essential definitions and terminology, and an overview and comparison of PROC GLIMMIX and PROC BGLIMM estimation and inference.

GLMM BASICS

The classical format for a statistical model is the equation

Response variable = systematic component + random component

Think of the systematic component as the fixed effects, e.g., in ANOVA-type models with treatment effects or in linear regression models. In mixed models, the random component is really two distinct components, random model effects such as block effects, and residual effects, such as those that account for serial covariance in a longitudinal mixed model. The classical response = fixed + random format works well if you can assume that the data follow a Gaussian distribution, but in many cases this is not true. Table 1 shows common types of response variables in the left-hand column and types of model effects across the top row.

Response Variable	Fixed		Random
Response Variable	Categorical (CLASS)	Continuous	Model effect	Residual / Covariance structure
Gaussian				“R-side”
Categorical binomial multinomial				“G-side”
Continuous proportion beta
Count Poisson negative binomial
time-to-event
etc...

Table 1. Response variable and model effect types covered by LMM and GLMM

Replacing the classical model format, the defining elements of the GLMM are as follows:

Distributions:
- ; denotes some distribution, e.g., one of those listed above
- , , ,
- , , is scale matrix, e.g., or residual covariance
Link function:
Linear Predictor:

PROC MIXED allows you to work in the first row of Table 1. PROCs BGLIMM, GLIMMIX, MCMC and NLMIXED allow you to work with all of the rows.

FREQUENTIST GLMM ESTIMATION AND INFERENCE

The guiding principle of GLMM estimation is maximum likelihood (ML). You obtain estimates of the fixed effects by maximizing the log-likelihood, , where = . In general, this integral is intractable.

SAS software uses two approximation strategies: linearization (pseudo-likelihood) and integral approximation (quadrature and Laplace). The mixed model equations for Gaussian data, the PROC MIXED default used to obtain REML estimates of the variance components and ML solutions for and , are special cases of pseudo-likelihood (PL). The REML version of PL is the PROC GLIMMIX default. Laplace and quadrature are PROC GLIMMIX options. PROC NLMIXED uses quadrature only.

Estimates and standard errors computed from PL or integral approximation are used to compute test statistics, p-values and confidence intervals. Classical frequentist inference makes extensive use of significance testing.

BAYESIAn estimation and inference

The primary tool of Bayesian inference is the posterior distribution,
𝑓(𝛽, 𝜎|𝑦, 𝑏) =
𝑓(𝑦,𝑏|𝛽,𝜎)𝑓(𝛽,𝜎)
∬ 𝑓(𝑦,𝑏|𝛽,𝜎)𝑓(𝛽,𝜎)𝑑𝛽𝑑𝜎
where 𝜎 denotes the vector of covariance parameters and 𝑓(𝛽, 𝜎) denotes the prior distribution of the fixed effects and covariance parameters. The function 𝑓(𝑦, 𝑏|𝛽, 𝜎) is the same likelihood used in frequentist estimation, defined by the GLMM distributions 𝑓(𝑦|𝑏) and 𝑓(𝑏).

As with the GLMM likelihood, the integrals required for the posterior distribution are generally intractable. Unlike frequentist estimation, neither linearization nor integral
approximation are viable options. Instead, Bayesian methods approximate the posterior distribution by simulation.

PROC BGLIMM involves the following steps:

Specify the GLMM (i.e., the defining elements listed above). This step specifies 𝑓(𝑦|𝑏) and 𝑓(𝑏).
Specify the prior distributions 𝑓(𝛽, 𝜎). The BGLIMM procedure uses pre-programmed default priors. Depending on the model and data, you may or may not need to
replace them with more appropriate prior distributions.
PROC BGLIMM implements three computational steps. It is beyond the scope of this tutorial to provide technical description of the algorithms used to implement these steps. The following is simply a list of the steps and their purpose:
- Tuning. This step uses the data, model and priors to come up with an approximation of the posterior distributions to be sampled.
- Burn-in. This is initial sampling the posterior distributions. A coin flip exercise in introductory statistics class provides simple analogy: the proportion of flips 4 resulting in heads vacillates early in the sampling, but eventually settles down to a reasonable approximation of the probability of a head.
- Sampling the posterior distribution. As the name implies, the posterior distribution of each parameter is sampled with the goal of getting a sufficiently accurate and detailed characterization.

The end result is a mean, standard deviation and quantiles of interest for the posterior distribution of each element of 𝛽 and each covariance parameter. You can use BGLIMM’s ESTIMATE statement to define estimable functions of the 𝛽 that address objectives of your data analysis. Two forms of credible intervals are commonly used for Bayesian inference:

the highest posterior density (HPD) interval. This is the narrowest interval between the lower and upper bound that contains a given percent of the posterior distribution.
quantile-based interval. For example, you can construct a 95% credible interval using the 2.5 and 97.5 percentiles of the posterior distribution as the lower and
upper bounds.

There are a number of diagnostics you should check before using PROC BGLIMM results. Diagnostics are introduced in the context of the examples in the following sections. These examples cover problems you are likely to encounter, how to interpret the relevant diagnostics, and PROC BGLIMM options you can use to address these problems in order to obtain a useable analysis.

EXAMPLE 1: MULTI-CLINIC TRIAL WITH BINOMIAL DATA

The first example is the Beitler and Landis (Biometrics, 1985) multi-clinic data set that appears in SAS for Mixed Models (2018), Chapter 11, Section 11.3. Two treatments, CNTL and DRUG, are compared at eight clinics sampled from a target population. At the clinic, patients are assigned to the treatment. The response variable, denoted , is the number of patients having a favorable outcome.

The first step in the analysis of these data is to identify an appropriate statistical model. You can do this by following a process presented in SAS for Mixed Models, Chapter 5. List the sources of variation separately for the “study” design and the treatment design. The study design, also called the “experiment design,” describes the components of the design before being assigned treatments or treatment levels. In this case, the study design consists of clinics and groups of patients within clinics. The treatment design is simply CNTL and DRUG. Groups are randomly assigned to treatments. Table 2 shows the sources of variation.

STUDY DESIGN		TREATMENT DESIGN		COMBINED
SOURCE	DF	SOURCE	DF	SOURCE	DF
clinic	7			clinic	7
		treatment	1	treatment	1
group(clinic)	8			group(clinic) \| treatment a.k.a. clinic x treatment a.k.a. unit of observation	8-1=7
TOTAL	15			TOTAL	15

Table 2. Sources of Variation for Multi-Clinic Example Data

In the combined column, read “group(clinic) | treatment” as “group within clinic after accounting for treatment.” List the rows so that treatment appears in the row immediately above the unit to which it is assigned – in this case, group(clinic).

The next step is to write model effects associated with each source of variation and the assumed probability distribution of any effects considered to be random. Tables 3 and 4 show two scenarios that lead to plausible mixed models. Table 3 describes the elements of a logit-normal GLMM, so-called because it the uses a logit link function, , where denotes the probability of a favorable outcome for the clinic-treatment combination, and all random effects are assumed to follow a normal (Gaussian) distribution. Note that clinics effects are random because the clinics in the study are a representative sample. Also note that there is a random effect for the group level source of variation. You should include this term to account for variation in at this level – if you don’t, over-dispersion may cause misleading results.

Note that group(clinic) is the unit on which the observations ( ) are taken. Hence, you can refer to it as the “unit level” effect, unit being shorthand for unit of observation. Specifying the model requires stating the assumed distribution of the observations, conditional on the random effects, at the unit level, and relating its expected value to the linear predictor, e.g., through the inverse link function as shown here.

SOURCE	EFFECT DISTRIBUTION	OBSERVATION AND MODEL
clinic
treatment
group(clinic) \| treatment a.k.a. clinic x treatment a.k.a. unit of observation

Table 3. Model Effects and Distributions Defining Logit-normal GLMM

Table 4 gives the elements of a beta-binomial mixed model, a commonly used alternative to the logit-normal GLMM for binomial data. The difference between the two models is the way group-level variation in the probability of a favorable outcome is modeled. The logit-normal GLMM includes a random effect in the linear predictor. The beta-binomial models the probability as a random variable, assuming that it follows a beta distribution. Table 4 gives the Ferrari and Cribari-Neto (2004) parameterization of the beta in terms of its mean ( ) and scale parameter ( ). Alternatively, you can write the distribution in its conventional math-stat form, , where and .

SOURCE	EFFECT DISTRIBUTION	OBSERVATION AND MODEL
clinic
treatment
group(clinic) \| treatment a.k.a. clinic x treatment a.k.a. unit of observation

Table 4. Model Effects and Distributions Defining Beta-Binomial Mixed Model

You can use PROC BGLIMM (or PROC GLIMMIX) to implement the logit-normal GLMM, but not the beta-binomial. This is because BGLIMM and GLIMMIX are limited to models with Gaussian random effects, whereas the beta-binomial has a non-Gaussian random effect, . (Although beyond the scope of this tutorial, you can implement the beta-binomial with PROC MCMC)

The beta-binomial is shown here to illustrate the difference between a “sensible” model and an improperly specified model. What constitutes a “sensible” model? You must have a one-to-one correspondence between sources of variation and model effects or parameters that account for them. Table 5 illustrates the difference.

	LOGIT-NORMAL	BETA-BINOMIAL	NAIVE GLMM	BETA-BIN W/
SOURCE	sensible	sensible	Over-Dispersion	Unit Confounding
clinic
treatment
unit				,

Table 5. Sensible and Improperly Specified Mixed Models for Multi-Clinic Binomial Data

Both the logit-normal and beta-binomial meet the sensible model criterion. The “naive GLMM” lacks any term that accounts for variation at the unit level, making the model vulnerable to over-dispersion. The effect is thus needed for the logit-normal model, but if you include it in the beta-binomial model, it is confounded with the beta distribution’s scale parameter ( ), which is not good.

The remainder of this section focuses on implementing the logit-normal model with PROC BGLIMM.

BASIC PROC BGLIMM STATEMENTS FOR THE LOGIT-NORMAL bglimm

PROC BGLIMM uses syntax borrowed from PROC GLIMMIX and, for repeated measures, from PROC MIXED. To illustrate, here are the basic GLIMMIX and BGLIMM statements. First, the GLIMMIX statements for the logit-normal GLMM from SAS for Mixed Models:

proc glimmix data=multi_clinic;

class clinic treatment;

model fav/nij = treatment;

random intercept treatment / subject=clinic;

lsmeans treatments / ilink diff oddsratio cl;

run;

Now the BGLIMM statements:

proc bglimm data=multi_clinic plots=(trace autocorr density)

diagnostics=all outpost=cout;

class clinic treatment;

model fav/nij = treatment / init=pinit;

random intercept treatment / subject=clinic;

estimate “CNTL” intercept 1 treatment 1 0;

estimate “DRUG” intercept 1 treatment 0 1;

estimate “log_odds_ratio” treatment 1 -1;

ods output estimates=model_scale;

run;

Notice that the CLASS, MODEL and RANDOM statements for the two procedures are identical. The FAV/NIJ (events/trials) syntax is specific to the binomial distribution. Also notice that there are no statements in the basic PROC BGLIMM program specifying prior distributions. BGLIMM has pre-programmed priors that it uses by default. These may or may not be appropriate, depending on the model and data. The additional options in the PROC BGLIMM statement for PLOT and DIAGNOSTICS should be considered standard operating procedure to either verify that the results are useable or to identify problems that need to be fixed, such as inappropriate default priors, before the output can be regarded as useable. The INIT=PINIT option in the MODEL statement causes the default starting values used for the fixed effects to be included is the SAS listing. Consider these to be part of the diagnostics. PROC BGLIMM does not have an LSMEANS statement. You must write the ESTIMATE statements that correspond to least squares means and treatment differences. PROC BGLIMM also lacks an ILINK option. PROC BGLIMM only computes model-scale (in this case, logit scale) estimates. The OUTPOST option in the PROC statement and the ODS OUTPUT statement give you two ways to obtain data-scale statistics. These are explained below in the subsection entitled “Post-Processing to Obtain Data-Scale Statistics.”

DIAGNOSTICS

The PLOT, DIAGNOSTICS=ALL and INIT=PINIT produce items in the SAS listing the help you identify problems and to decide if the results are useable. This section has two parts: a list of the items and a brief description of their purpose, and diagnostics produced by the default PROC BGLIMM statements given above and their interpretation.

List of Diagnostics

The PLOT option gives you three plots: autocorrelation, density and trace. Trace plots are also called “caterpillar” plots, because ideally they should look like a “fuzzy caterpillar,” which indicates the sampling algorithm has produced a reasonable approximation of the posterior distribution. The algorithms used to sample the posterior distributions are vulnerable to autocorrelation. The autocorrelation plot shows how quickly autocorrelation decreases as lag increases. The density plot, as the name implies, shows the posterior density produced by the sampling algorithm.

DIAGNOSTICS=ALL produces the following:

Effective Sample Size (ESS). Autocorrelation reduces posterior sampling efficiency. ESS gives a measure of the sample size after adjusting for autocorrelation. Low efficiency, per se, is not a problem, but ESS should be, at the very least, >1000 in order to have an accurate approximation of the posterior distribution.
Heidelberger-Welch. These are stationarity tests – is the posterior distribution sampling giving consistent results from beginning (immediately after burn-in) to end? The key columns in the SAS listing will say “passed” or “failed.”
Raftery-Lewis. Addresses the accuracy of the posterior distribution’s percentile estimates. The key statistic is the dependence factor. Ideally, it should be @ 1.
Geweke. Compares estimates early and late in the sampling process. Use the Geweke statistics to test the null hypothesis that they are acceptably similar. You want to see p-values consistent with failing to reject the null hypothesis.

The listing also includes the number of burn-in iterations, the posterior density sample size and the priors the BGLIMM procedure uses by default.

Results of Preliminary PROC BGLIMM Run with Default Settings

Figure 1 shows two diagnostic plots. On the left is the plot for the logit-normal GLMM’s intercept parameter produced by BGLIMM program given above. On the right is an example of what the plot should look like in a run whose results are useable. The plots for the other parameters are not shown, but they are similar to these.

Figure 1. Example Diagnostic Plots

The left-hand plot is what you do not want to see. The trace plot shows erratic and inconsistent sampling of the posterior distribution. It looks more like a seismograph during an earthquake than the “fuzzy caterpillar” plot on the right. The right-hand plot shows consistent sampling from beginning to end. The autocorrelation plot on the left also signals trouble. Autocorrelation should drop to zero quickly, as it does by lag 10 in the right-hand plot. The BGLIMM run with default settings produces autocorrelation that never goes to zero, even at lag 50. If autocorrelation does not drop to zero at least by lag 25, consider the results unusable.

Output 1 shows results produced by the DIAGNOSTICS=ALL option. Statistics that signal trouble are highlighted in bold.

Output 1. Diagnostic Statistics from PROC BGLIMM Logit-normal Program with Default Settings

Effective Sample Sizes
Parameter	ESS	Autocorrelation Time	Efficiency
Intercept	174.1	28.7273	0.0348
trt cntl	301.6	16.5782	0.0603
trt drug	.	.	.
Random VC(1)	886.7	5.6389	0.1773
Random VC(2)	722.3	6.9225	0.1445

Geweke Diagnostics
Parameter	z	Pr > \|z\|
Intercept	-0.4815	0.6302
trt cntl	-0.2990	0.7649
trt drug	.	.
Random VC(1)	0.5412	0.5884
Random VC(2)	0.9604	0.3369

Raftery-Lewis Diagnostics
Quantile=0.025 Accuracy=+/-0.005 Probability=0.95 Epsilon=0.001
Parameter	Number of Samples			Dependence Factor
Parameter	Burn-In	Total	Minimum	Dependence Factor
Intercept	29	32909	3746	8.7851
trt cntl	18	19114	3746	5.1025
trt drug	.	.	.	.
Random VC(1)	4	4636	3746	1.2376
Random VC(2)	4	4714	3746	1.2584

Heidelberger-Welch Diagnostics
Parameter	Stationarity Test				Half-Width Test
Parameter	Cramer-von Mises Stat	p-Value	Test Outcome	Iterations Discarded		Half-Width	Mean	Relative Half-Width	Test Outcome
Intercept	0.0951	0.6092	Passed	0		0.0875	-0.4288	-0.2041	Failed
trt cntl	0.2343	0.2098	Passed	0		0.0630	-0.9815	-0.0641	Passed
trt drug	.	.		.		.	.	.
Random VC(1)	0.2008	0.2659	Passed	2000		0.0655	1.8446	0.0355	Passed
Random VC(2)	0.0432	0.9160	Passed	0		0.0855	1.2928	0.0661	Passed

The “Effective Sample Sizes” table ESS and “Efficiency” results indicate trouble. All ESS values are less than 1000. The Geweke statistics look okay here, but you should always check them. The Raftery-Lewis dependence factors for “Random VC(1)” and “VC(2),” the clinic and clinic x treatment variance ( and ) respectively, are acceptably close to one, but those for the intercept and “trt cntl” effects ( and ) are much too high. Rules of thumb being at best approximate, if you see dependence factors greater than 3 or 4, consider your results to be unusable. Finally, the Heidelberger-Welch half-width test for intercept failed.

ADDRESSING PROBLEMS

The problems identified by the above diagnostics can occur for several reasons. The number of burn-in iterations or the size of the posterior distribution sampling may be inadequate. You can use thinning, that is, only taking every 5^th or 10^th or even 50^th posterior density sample instead of every sample, to reduce autocorrelation. The default starting values for the fixed effect parameters may not be appropriate. The default priors may be too diffuse, including values that are technically in the parameter space, but highly implausible. Or the default priors may be where the parameters are not, that is, the most likely values of the parameter may be in an extreme tail of the prior distribution. Output 2 shows the defaults used by PROC BGLIMM for the logit-normal GLMM.

Output 2. Sampling, Thinning, Starting Values and Priors for Default Logit-normal BGLIMM Run

Model Information
Burn-In Size	500
Simulation Size	5000
Thinning	1

Initial Values for Fixed Effects
Parameter	Value
Intercept	-0.3102
trt cntl	-0.4040

Priors for Fixed Effects
Parameter	Prior
Intercept	Constant
trt cntl	Constant

Priors for Scale and Covariance Parameters
Parameter	Prior
Random Cov (Diag)	Inverse Gamma (Shape=2, Scale=2)

The “Model Information” table shows the default burn-in size (number of burn-in iterations), the “simulation size” (number of samples of the posterior density) and thinning. Thinning equal one means no thinning was done. The “Initial Values for Fixed Effects” table shows the starting values for and . Compare these to their estimates from PROC GLIMMIX, and . You can use the estimates from GLIMMIX as starting values instead of letting BGLIMM’s algorithm choose starting values.

The default priors for and are “constant” – essentially uniform distributions between plus and minus infinity. Overly diffuse priors that include highly implausible parameter values can cause problems – given that the actual values of and are unlikely to be very far from their GLIMMIX estimates, the “constant” prior may be an issue. The prior for both variance components is an inverse gamma with shape and scale parameters equal to two. Figure 2 shows a plot of the inverse gamma(2,2) distribution with vertical lines showing the GLIMMIX estimates of and .

Figure 2. Plot of PROC BGLIMM Default Prior for Variance Components

The lower vertical bar is the GLIMMIX estimate of the clinic x treatment variance, . The upper vertical bar is the GLIMMIX estimate . You can see that the most likely value of the clinic x treatment variance is in the extreme lower tail of the default prior. Most of the probability mass of the inverse gamma(2,2) distribution is located over values that are implausibly high for . The clinic variance shows the opposite problem, although to a lesser extent – its most likely value is in the upper tail of the default prior.

Christiansen, et al. (2011) suggest a strategy for selecting an appropriate prior for a variance component. They suggest using precision, defined as the inverse of the variance, denoted here as , and identifying a gamma distribution whose mode is approximately equal to the most likely value of the parameter being estimated (you can use the estimates from PROC GLIMMIX as a logical most likely value) and whose upper and lower extreme quantiles (say the 1^st and 99^th) correspond to values below and above which are thought to be highly unlikely. The mode of a gamma distribution with shape parameter and scale parameter is . You can specify the mode, try several values of , solve for and evaluate the resulting distribution.

The following is an example a program you can use for prior hunting. The example is for the clinic x treatment variance. The PROC GLIMMIX estimate using the pseudo-likelihood default is . The estimate using GLIMMIX’s METHOD=QUADRATURE option is . These translate to precisions of 17 and 100 respectively. The program uses a mode between the two, in this example 40. Use the PDF and CDF functions to create the distribution for each combination. The PROC PRINT statement allows you to examine the lower and upper quantiles of the candidate distributions. The PROC GPLOT statement allows you to visualize the candidate distributions.

The program statements are as follows:

data gamma_prior;

mode=40;

/* A=shape parameter */

/* B=scale parameter */

do a=1.01,1.05,1.1,1.25,1.5,2,3,5,10,20;

b=mode/(a-1);

do precision=0 to 200;

pdf_gamma=pdf("gamma",precision,a,b);

cdf_gamma=cdf("gamma",precision,a,b);

output;

end;

proc sort data=gamma_prior; by a;

proc print;

where (0.009<cdf_gamma<0.011 or 0.989<cdf_gamma<0.991);

run;

proc gplot data=gamma_prior;

by a;

plot pdf_gamma*precision/href=17,40,100;

plot cdf_gamma*precision/vref=0.01,0.99;

run;

Some trial and error, and art along with science, are involved at this point. The distribution selected is gamma with shape=3 and scale=20. Figure 3 shows plots of the p.d.f. and c.d.f.

Figure 3. Plots of the Prior Distribution Selected for the Clinic x Treatment Variance

You can see that p.d.f. has a mode of 40 and contains the precision values 17 and 100 well within its range. The c.d.f. plot shows that the lower and upper extreme quantiles are just under 10 and just under 200, respectively. Translated to the variance scale, this prior covers a range between to just over 0.1, the range considered plausible. Using the same strategy, a suitable prior for the clinic precision is gamma with shape=1.5 and scale=1.

REVISED PROC BGLIMM PROGRAM

The following shows the PROC BGLIMM program modified to include the options needed to obtain useable results. These options appear in bold. The program:

proc bglimm data=clinics seed=81152097

nbi=2500 nmc=100000 thin=10

plots=(trace autocorr density) diagnostics=all

statistics(percent=(2.5 50 97.5))

outpost=cout;

class clinic trt;

model fav/Nij=trt /

init=(list=(-0.46 -0.75) pinit) coeffprior=normal(var=9);

random intercept / subject=clinic covprior=igamma(shape=1.5, scale=1);

random trt / subject=clinic covprior=igamma(shape=3, scale=0.05);

estimate "CNTL" intercept 1 trt 1 0;

estimate "DRUG" intercept 1 trt 0 1;

estimate "log odds-ratio" trt 1 -1;

ods output estimates=modelscale;

title "BGLIMM - all needed options";

run;

The NBI, NMC and THIN options increase burn-in iterations, posterior distribution sample size and thinning, respectively. The INIT=(LIST=( option allows you to specify starting values for the fixed effect parameters and . The COEFFPRIOR option specifies a prior distribution for these effects. Note that your options for COEFFPRIOR are either “constant” or “normal.” VAR allows you to specify the variance, but the normal prior is always centered at a mean of zero. The COVPRIOR option specifies a prior for the variance of random model effects. Notice that you can have multiple RANDOM statements, which allows you to use different priors for each variance. The COVPRIOR must be specified in terms of the inverse gamma distribution. The required syntax is IGAMMA (SHAPE= , SCALE= ). Use the following result: is equivalent to . For example, the prior for the clinic x treatment precision translates to an prior for the clinic x treatment variance. Output 3 shows selected results.

Effective Sample Sizes
Parameter	ESS	Autocorrelation Time	Efficiency
Intercept	2952.3	3.3872	0.2952
trt cntl	10000.0	1.0000	1.0000
trt drug	.	.	.
Random1 Var	7644.9	1.3081	0.7645
Random2 Var	8756.1	1.1421	0.8756

Raftery-Lewis Diagnostics
Quantile=0.025 Accuracy=+/-0.005 Probability=0.95 Epsilon=0.001
Parameter	Number of Samples			Dependence Factor
Parameter	Burn-In	Total	Minimum	Dependence Factor
Intercept	6	6350	3746	1.6951
trt cntl	2	3834	3746	1.0235
trt drug	.	.	.	.
Random1 Var	2	3803	3746	1.0152
Random2 Var	2	3650	3746	0.9744

Posterior Summaries and Intervals
Parameter	N	Mean	Standard Deviation	95% HPD Interval
Intercept	10000	-0.4491	0.5504	-1.6003	0.5728
trt cntl	10000	-0.7454	0.3053	-1.3201	-0.1303
trt drug	0	.	.	.	.
Random1 Var	10000	2.1165	1.5208	0.3744	4.6572
Random2 Var	10000	0.0261	0.0281	0.00444	0.0639

Posterior Summaries
Parameter	N	Mean	Standard Deviation	Percentiles
Parameter	N	Mean	Standard Deviation	2.5	50	97.5
Intercept	10000	-0.4491	0.5504	-1.5574	-0.4418	0.6304
trt cntl	10000	-0.7454	0.3053	-1.3539	-0.7410	-0.1619
trt drug	0	.	.	.	.	.
Random1 Var	10000	2.1165	1.5208	0.6293	1.7386	5.7644
Random2 Var	10000	0.0261	0.0281	0.00701	0.0193	0.0845

Results from ESTIMATE Statements
Label	Mean	Standard Deviation	95% HPD Interval
CNTL	-1.1946	0.5612	-2.3136	-0.0881
DRUG	-0.4491	0.5504	-1.6003	0.5728
log odds-ratio	-0.7454	0.3053	-1.3201	-0.1303

Output 3. Selected Results from Useable PROC BGLIMM Run for Beitler-Landis Data

The “Effective Samples Sizes” table shows acceptable ESS and efficiency numbers. All Raftery-Lewis dependence factors are <2. In the interest of space, the other diagnostics are not shown here, but they are similarly acceptable, indicating that this run has produced useable results. The “Posterior Summaries and Intervals” table shows the mean, standard deviation and 95% highest posterior density (HPD) intervals for the model fixed effects and the variance components. The estimates in the “Mean” column are similar to those obtained with PROC GLIMMIX. Note that the fixed effect component of the linear predictor, is not full rank. PROC BGLIMM uses the same convention – setting the last effect, in this case , to zero – as other SAS linear model procedures. The “Posterior Summaries” table gives percentiles specified by the STATISTICS(PERCENT= option instead of HPD intervals. The “Results from ESTIMATE Statements” table give the posterior distribution mean, standard deviation and 95% HPD intervals from the ESTIMATE statements.

Note that all of these statistics are on the model scale – in this case the logit scale. Because PROC BGLIMM has no ILINK option, you must output these results and use post-processing steps to obtain data scale estimates. The most obvious of these are the probabilities of a favorable outcome for each treatment, and , the difference between the two, , and the odds-ratio, . There are two ways to do this. These are shown in the next section.

post-processing to obtain data-scale estimate

You can either use the data set produced by the OUTPOST option in the PROC statement or the ESTIMATES from the ODS OUTPUT statement to obtain data-scale estimates.

Data Scale Estimates Using OUTPOST and the %SUMINT Macro

After you run the PROC BGLIMM program that produces useable output, use the following statements:

data datasc;

set cout;

pr_cntl=logistic(cntl);

pr_drug=logistic(drug);

ProbDiff=logistic(drug)-logistic(cntl);

OddsRatio=exp(log_odds_ratio);

run;

%sumint(data=datasc, var=pr_cntl: pr_drug: ProbDiff: OddsRatio)

The OUTPOST option creates a data set of posterior samples for the model fixed effects and ESTIMATES. In this example, the data set is called COUT. The DATA step creates a new data set, DATASC, with the favorable outcome probabilities, labelled PR_CNTL and PR_DRUG, their difference, PROBDIFF, and the odds-ratio. The LOGISTIC function implements the logit’s inverse link, . The difference estimates the log of the odds-ratio, so gives you the estimated odds-ratio. The %SUMIT macro computes and prints inferential statistics for the terms listed after VAR= for the data set given by DATA=<data set name>. Output 4 shows the results.

Posterior Summaries and Intervals
Parameter	N	Mean	Standard Deviation	95% HPD Interval
OddsRatio	10000	0.4970	0.1532	0.2251	0.8004
pr_cntl	10000	0.2461	0.0990	0.0750	0.4475
pr_drug	10000	0.3965	0.1224	0.1679	0.6394
ProbDiff	10000	0.1504	0.0667	0.0248	0.2800

Output 4. Data Scale Estimates from OUTPOST and %SUMINT

Data Scale Estimates Using ODS OUTPUT

Instead of using the OUTPOST data set and the %SUMINT macro, you can use the following statements:

data lsm; set modelscale;

if label='CNTL' or label='DRUG';

prob=1/(1+exp(-Mean));

lower=1/(1+exp(-HPDLower));

upper=1/(1+exp(-HPDUpper));

data oddsratio; set modelscale;

if label='log odds ratio';

OddsRatio=exp(Mean);

lower=exp(HPDLower);

upper=exp(HPDUpper);

proc print data=lsm;

proc print data=oddsratio;

run;

MODELSCALE is the data set produced by the ODS OUTPUT statement. These DATA steps produce two data sets. The one called LSM uses the logit’s inverse link to compute the estimates and HPD interval lower and bounds for and . The data set called ODDSRATIO computes the estimates and HPD bounds for the odds-ratio. Use the IF LABEL= statement to select items to be included in each data set. The LABEL names are the labels used in the ESTIMATE statements. Both data sets use the results of the ESTIMATE statements instead of the posterior sample data set, so the results differ from the %SUMINT listing above. If you want the estimated difference between the probabilities, use OUTPOST and %SUMINT – it is more convenient. Output 5 shows the results using the ODS OUTPUT approach.

Obs	Label	Mean	StdDev	HPDLower	HPDUpper	prob	lower	upper
1	CNTL	-1.1946	0.5612	-2.3136	-0.0881	0.23245	0.09001	0.47798
2	DRUG	-0.4491	0.5504	-1.6003	0.5728	0.38957	0.16793	0.63942

Obs	Label	Mean	StdDev	HPDLower	HPDUpper	OddsRatio	lower	upper
1	log odds-ratio	-0.7454	0.3053	-1.3201	-0.1303	0.47452	0.26712	0.87780

Output 5. Data Scale Estimates from ODS OUTPUT and Follow-up DATA Steps

The Mean, StdDev, HPDLower and HPDUpper values are model scale values from the BGLIMM ESTIMATE statements. The three columns on the right, labelled “prob” or “OddsRatio” and “lower” and “upper” are the data scale estimates.

EXAMPLE 2: MULTI-LEVEL DESIGN WITH COUNT DATA

This example appears in SAS for Mixed Models (2018) in Chapter 13, Section 13.3. The data are from an agricultural experiment comparing two cultivation methods and two seed mixes. Each of six fields is divided into two sections, called whole plots. Each whole plot is randomly assigned to a method so that both methods appear at each field. Each whole plot is divided into two split plots to which mixes are randomly assigned. Table 6 shows the sources of variation.

EXPERIMENT DESIGN		TREATMENT DESIGN		COMBINED
SOURCE	DF	SOURCE	DF	SOURCE	DF
field	5			field	5
		method	1	method	1
whole-plot(field)	6			field x method (wp)	6-1=5
		mix	1	mix	1
		method x mix	1	method x mix	1
split-plot(wp)	12			sp (wp) \| method, mix	12-2=10
TOTAL	23			TOTAL	23

Table 6. Sources of Variation for Multi-Level Field Trial

The response variable is a discrete count. SAS for Mixed Models describes two “sensible” models, the Poisson-normal GLMM and the negative binomial GLMM. This section presents the latter, as it is the model of choice for these data, and the negative binomial illustrates PROC BGLIMM options that would not be used with the Poisson-normal model.

Table 7 uses the combined sources of variation column from Table 6 to show how the negative binomial arises in the context of this experiment.

SOURCE	EFFECT DISTRIBUTION	OBSERVATION AND MODEL
field
method
field x method (wp)
mix
method x mix
sp(wp)\|method,mix a.k.a. unit of obs.

Table 7. Model Effects and Distribution Defining Poisson-Gamma Process

The method and mix effects are denoted by and , respectively. The negative binomial arises from a Poisson-gamma process. Variation at the unit of observation level is assumed to follow a gamma distribution with If you integrate out the unit level term, , where NB denotes negative binomial. Thus, the following elements define the GLMM:

Distributions: , ,
Link function:
Linear predictor:

With PROC BGLIMM, replacing the effects part of the linear predictor, , with its cell means, full rank equivalent makes it easier to use the ESTIMATE statement to compute terms of interest. The cells means form can be written .

The CLASS, MODEL and RANDOM statements for this model are:

class field method mix;

model count=method*mix / noint distribution=negbin;

random intercept method/ subject=field;

As with Example 1, the default BGLIMM settings give unusable results. In the interest of space, not all diagnostics are shown. Output 6 shows two examples, the Geweke statistics and the plots for the negative binomial scale parameter, .

Geweke Diagnostics
Parameter	z	Pr > \|z\|
*method 1mix 1**	1.8922	0.0585
*method 1mix 2**	2.3868	0.0170
*method 2mix 1**	0.9584	0.3379
*method 2mix 2**	1.8992	0.0575
Scale	-2.7547	0.0059
Random VC(1)	1.4784	0.1393
Random VC(2)	-0.4486	0.6537

Output 6. Example diagnostics: PROC BGLIMM negative binomial GLMM with default settings

Four of the seven parameters on the “Geweke Diagnostics” table show low p-values. For the scale parameter, p<0.01. The trace and autocorrelation plots are unacceptable. To obtain useable results, you must work through the same set of steps as illustrated in the previous section. The following program statements are the result:

proc bglimm data=sp_counts plots=(trace autocorr density)

nbi=2500 nmc=1000000 thin=100 seed=20210209

diagnostics=all dic outpost=model_scale;

class field method mix;

model count=method*mix / noint distribution=negbin

init=(list=(2.5, 2.9, 0.8, 2.0) Pinit)

scaleprior=gamma(shape=20,iscale=25);

random intercept method/ subject=field

covprior=igamma(shape=4,scale=3);

estimate "LSM_11" method*mix 1 0 0 0;

estimate "LSM_12" method*mix 0 1 0 0;

estimate "LSM_21" method*mix 0 0 1 0;

estimate "LSM_22" method*mix 0 0 0 1;

estimate "interaction" method*mix 1 -1 -1 1;

estimate "Method Main Effect" method*mix 1 1 -1 -1 / divisor=2;

estimate "Mix Main Effect" method*mix 1 -1 1 -1 / divisor=2;

ods output estimates=link_scale;

title "Negative Binomial GLMM";

run;

The burn-in, posterior sampling and thinning options may look extreme, but computing time is minimal, and they guarantee useable results. The INIT option uses PROC GLIMMIX results as starting values for the METHOD*MIX ( ) effects. Changing from the “constant” to the “normal” prior has little effect with this model.

The COVPRIOR in the RANDOM statement is based on using estimates from PROC GLIMMIX, and , as most likely values. Given that they are similar in value, the same inverse gamma with shape=4 and scale =3 is used for both. The SCALEPRIOR option allows you to specify a prior for the negative binomial scale parameter. The search using the Christiansen, et al. approach described in the previous section resulted in selecting a prior. Note that BGLIMM requires the gamma distribution to be specified using the inverse scale, hence ISCALE=25 rather than SCALE=0.04.

The ESTIMATE statements define means, interaction and main effect differences. These are given on the model (log) scale in the SAS listing. You can use either the %SUMINT macro with the OUTPOST=MODEL_SCALE data set or the ODS OUTPUT data set to compute data-scale equivalents. In the interest of space, only the OUTPOST plus %SUMINT program statements and output are shown.

The SAS statements:

data data_scale;

set model_scale;

Lambda_11=exp(LSM_11);

Lambda_12=exp(LSM_12);

Lambda_21=exp(LSM_21);

Lambda_22=exp(LSM_22);

Method_Diff_Ratio=exp(Method_Main_Effect);

Mix_Diff_Ratio=exp(Mix_Main_Effect);

%sumint(data=data_scale, var=Lambda_11: Lambda_12: Lambda_21: Lambda_22:

interaction: Method_Diff_Ratio: Mix_Diff_Ratio)

Output 7 shows the listing.

Posterior Summaries and Intervals
Parameter	N	Mean	Standard Deviation	95% HPD Interval
Lambda_11	10000	19.2095	18.9908	0.7638	52.9988
Lambda_12	10000	31.6319	31.0823	1.4792	86.1046
Lambda_21	10000	4.2671	3.5069	0.4212	10.6353
Lambda_22	10000	13.5674	11.0600	1.0802	32.3851
interaction	10000	0.6695	0.8461	-0.9995	2.3477
Method_Diff_Ratio	10000	3.6754	2.7677	0.3665	8.3976
Mix_Diff_Ratio	10000	0.4728	0.2049	0.1595	0.8911

Output 7. Data-Scale Results of Negative Binomial Split-Plot GLMM Analysis

The LAMBDA_11, etc. terms give the rate parameter estimates, for the four method-mix combinations. The “interaction” term remains on the model scale. There is no need to express it on the data scale, as its only purpose is to assess the magnitude of the method x mix interaction. The HPD interval includes zero; using interval as the criterion, you could conclude that there is insufficient evidence to reject the hypothesis of no interaction. The METHOD_DIFF_RATIO” term estimates where is the rate parameter for the method averaged over both mixes. The difference = , hence . The “MIX_DIFF_RATIO” is similarly defined, but for mix instead of method. Think of these as data scale main effect measures. Although not shown here, you could also obtain statistics for differences between selected LSM terms, similar to the way PROBDIFF was defined in the previous section.

EXAMPLE 3: REPEATED MEASURES DATA

This example is from Chapter 8 of SAS for Mixed Models (2018). It appears as “Respiratory Data” for a repeated measures example in Littell, Pendergast and Natarijan (2000), and in Chapter 4 of SAS for Linear Models, 4^th Edition (Littell, et al., 2002). The data are from a trial comparing a standard asthma treatment (A), a test drug (C) and a placebo (P). Treatments were randomly assigned to 24 patients (72 patients total participated in the trial). The response variable, FEV1, is a measure of breathing ability. A baseline measurement (BASEFEV1) was taken on each patient, then measurements every hour for eight hours starting at one hour after application. FEV1 can be assumed to follow a Gaussian distribution. Table 8 shows the experiment design, treatment design and combined sources of variation.

EXPERIMENT DESIGN		TREATMENT		COMBINED
SOURCE	DF	SOURCE	DF	SOURCE	DF
		drug	2	drug	2
patient	72-1=71			patient(drug) a.k.a. between	71-2=69
		hour	7	hour	7
		drug x hour	14	drug x hour	14
occasion(patient)	72(8-1) = 504			occ(patient)\|drug a.k.a. within	504-7-14 = 493
TOTAL	575			TOTAL	575

Table 8. Sources of Variation for Respiratory Repeated Measures Data

In repeated measures terminology, patient(drug) is called the between subjects effect and the occ(patient)|drug (occasion within patient after accounting for drug) residual term is called the within subjects effect. Following SAS for Mixed Models, the LMM of choice has the following elements:

Distributions
- , where , the vector of FEV1 measures on the patient assigned to the drug over the hours of observation, is the corresponding mean vector and is the covariance matrix modeling serial correlation among the within subjects effects. SAS for Mixed Models uses the AR(1) covariance model.
- is the between subjects effect
Linear Predictor: , where and denote drug and time (hour) effects, respectively.

The covariance parameters for the for the matrix are for autocorrelation and for residual (within subjects) variance. You can include the baseline covariance, BASEFEV1, by modifying the linear predictor to , where denotes BASEFEV1 for the patient assigned to the drug.

Figure 4 shows the interaction plot, which you can obtain using the PLOT=MEANPLOT option in the PROC GLIMMIX LSMEANS statement. The interaction plot is instructive, because it helps you visualize the objectives of this study and the requirements of the model needed to address the objectives.

Figure 4. Interaction Plot showing Change in FEV1 over hours by treatment.

The treatment is initially effective for the A and C drugs but not the placebo. Over time, FEV1 decreases as the drugs’ effectiveness wears off. The objective of the study is to see if there is a difference between the treatments as measured by initial effectiveness and by how long the effect lasts.

There are two ways to approach this objective. One is to model HOUR as a CLASS variable and track the simple effects and at each hour , where . The other is to redefine in terms of an unequal slopes linear regression model, i.e., , where denotes the hour, and compare intercept and slope coefficients.

NOTE: The documentation for PROC BGLIMM uses this data set as an example. In one passage, the documentation gives what it calls “Model 2” and “Model 3.” The MODEL statements for each are:

Model 2: fev1=basefev1 drug hour;
Model 3: fev1=basefev1 drug|hour;

The documentation characterizes the difference between model 2 and model 3 as “minor,” and shows results for model 2 but not model 3. Perhaps the difference is “minor” from a programming point of view, but the difference is not “minor” from a statistical practice point of view. Model 2 is incapable of addressing the objectives of this study and therefore fails the “sensible model” criterion. To address this study’s objectives, the drug x hour interaction must be included in the model in some form.

The PROC BGLIMM statements for simple effects and unequal slopes approaches are given as follows. For the simple effects approach, it is much more convenient to specify the cell-means form of the model. The statements are:

proc bglimm data=fev1uni nbi=1000 nmc=50000 thin=5 seed=97816352

plots=(trace autocorr density) diagnostics=all;

class Drug Patient Hour;

model FEV1 = BaseFev1 Drug*Hour;

random int / subject=Patient(drug);

repeated Hour / subject=Patient(Drug) type=ar(1) r rcorr;

run;

Notice that PROC BGLIMM barrows the syntax of the PROC MIXED REPEATED statement to specify the covariance model for the within subjects effect. The default priors work well for this model, but increasing burn-in, posterior sampling and thinning improves the results. There are twenty-four DRUG*HOUR least squares means, and sixteen simple effects comparing drug A and C to the placebo at each hour, for a total to forty ESTIMATE statements. In the interest of space, only one example of each type of statement is shown.

You can access the SAS files for all examples in this tutorial at ... (I need to find out from SAS where...)

The example ESTIMATE statements are:

estimate "LSM A at hour 1" intercept 1 basefev1 2.6493

Drug*Hour 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;

estimate "A vs P at hour 1"

Drug*Hour 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0;

The coefficients for least squares means (LSM) obtain the adjusted mean at the average BASEFEV1 covariate value of 2.6493. If you are in doubt, use the E option of the PROC GLIMMIX LSMEANS statement to get the coefficients needed for these statements.

The statements for the unequal slopes approach are as follows:

data fev1uni;

set fev1uni;

H=hour;

proc bglimm data=fev1uni nbi=1000 nmc=50000 thin=5 seed=44672057

plots=(trace autocorr density) diagnostics=all;

class Drug Patient Hour;

model FEV1 = BaseFev1 drug H(drug);

random int / subject=Patient(drug);

repeated Hour / subject=Patient(Drug) type=ar(1) r rcorr;

estimate "intercept - Drug A" intercept 1 basefev1 2.6493 drug 1 0 0;

estimate "intercept - Drug C" intercept 1 basefev1 2.6493 drug 0 1 0;

estimate "intercept - Drug P" intercept 1 basefev1 2.6493 drug 0 0 1;

estimate "slope - Drug A" H(drug) 1 0 0;

estimate "slope - Drug C" H(drug) 0 1 0;

estimate "slope - Drug P" H(drug) 0 0 1;

estimate "intercept A vs C" drug 1 -1 0;

estimate "intercept A vs P" drug 1 0 -1;

estimate "intercept C vs P" drug 0 1 -1;

estimate "slope A vs C" H(drug) 1 -1 0;

estimate "slope A vs P" H(drug) 1 0 -1;

estimate "slope C vs P" H(drug) 0 1 -1;

run;

You need to define a second variable for HOUR – in this case the DATA step with H=HOUR – and use one as the direct regression variable in the MODEL statement and the other as a CLASS variable for the REPEATED statement. The “INTERCEPT – DRUG A” etc. ESTIMATE statements obtain the estimated regression intercepts, adjusted for the baseline covariate, and the “SLOPE – DRUG A” etc. statements obtain , the slopes. The next six statements compare intercepts and slopes among the DRUG treatments.

Output 8 shows the covariance parameter estimates obtained using the cell means model. Output 9 shows the least squares means listing from the cell means model. Output 10 shows the simple effect difference results.

Posterior Summaries and Intervals
Parameter	N	Mean	Standard Deviation	95% HPD Interval
Residual Var	10000	0.0846	0.0104	0.0664	0.1058
Residual AR(1)	10000	0.5386	0.0550	0.4278	0.6407
Random Var	10000	0.2458	0.0453	0.1612	0.3341

Output 8. Covariance parameter estimates.

“Residual Var” gives the within subjects variance statistics. “Mean” gives you . “Residual AR(1)” give the autocorrelation parameter, . “Random Var” gives the between subjects variance, .

Results from ESTIMATE Statements
Label	Mean	Standard Deviation	95% HPD Interval
LSM A at hour 1	3.4739	0.1178	3.2501	3.7107
LSM C at hour 1	3.6903	0.1187	3.4705	3.9330
LSM P at hour 1	2.8264	0.1183	2.5956	3.0589
LSM A at hour 2	3.3963	0.1175	3.1685	3.6287
LSM C at hour 2	3.6251	0.1194	3.3849	3.8521
LSM P at hour 2	2.8916	0.1176	2.6606	3.1195
LSM A at hour 3	3.1839	0.1181	2.9470	3.4114
LSM C at hour 3	3.5764	0.1190	3.3390	3.8059
LSM P at hour 3	2.8974	0.1186	2.6571	3.1239
LSM A at hour 4	3.0460	0.1175	2.8098	3.2714
LSM C at hour 4	3.4424	0.1190	3.2178	3.6824
LSM P at hour 4	2.8713	0.1180	2.6237	3.0877
LSM A at hour 5	3.0533	0.1185	2.8278	3.2873
LSM C at hour 5	3.2487	0.1187	3.0172	3.4829
LSM P at hour 5	2.7686	0.1183	2.5390	2.9997
LSM A at hour 6	2.9800	0.1183	2.7533	3.2130
LSM C at hour 6	3.0845	0.1186	2.8482	3.3096
LSM P at hour 6	2.8160	0.1179	2.5817	3.0423
LSM A at hour 7	2.8687	0.1181	2.6414	3.0987
LSM C at hour 7	2.9764	0.1181	2.7273	3.1924
LSM P at hour 7	2.7859	0.1180	2.5595	3.0183
LSM A at hour 8	2.8581	0.1171	2.6377	3.0913
LSM C at hour 8	3.0093	0.1175	2.7724	3.2359
LSM P at hour 8	2.7330	0.1188	2.4999	2.9644

Output 9. Drug x Treatment Least Squares Means for Repeated Measures Data

Results from ESTIMATE Statements
Label	Mean	Standard Deviation	95% HPD Interval
A vs P at hour 1	0.6475	0.1652	0.3224	0.9630
C vs P at hour 1	0.8639	0.1683	0.5235	1.1894
A vs P at hour 2	0.5048	0.1648	0.1606	0.8070
C vs P at hour 2	0.7335	0.1678	0.4078	1.0694
A vs P at hour 3	0.2865	0.1651	-0.0478	0.6028
C vs P at hour 3	0.6789	0.1687	0.3453	1.0093
A vs P at hour 4	0.1747	0.1649	-0.1565	0.4920
C vs P at hour 4	0.5711	0.1680	0.2481	0.9090
A vs P at hour 5	0.2847	0.1658	-0.0489	0.6031
C vs P at hour 5	0.4801	0.1677	0.1611	0.8200
A vs P at hour 6	0.1641	0.1651	-0.1476	0.4993
C vs P at hour 6	0.2685	0.1669	-0.0668	0.5785
A vs P at hour 7	0.0828	0.1654	-0.2411	0.4083
C vs P at hour 7	0.1905	0.1673	-0.1365	0.5220
A vs P at hour 8	0.1251	0.1659	-0.1932	0.4553
C vs P at hour 8	0.2763	0.1667	-0.0641	0.5925

Output 10. Simple effect differences between drugs (A, C) and placebo (P)

Output 10 is the PROC BGLIMM analog to the results you would get with the SLICEDIFF=HOUR option in PROC GLIMMIX.

Output 11 gives the ESTIMATE statement results from the unequal slopes regression model.

Results from ESTIMATE Statements
Label	Mean	Standard Deviation	95% HPD Interval
intercept - Drug A	3.5169	0.1205	3.2852	3.7577
intercept - Drug C	3.8065	0.1197	3.5730	4.0403
intercept - Drug P	2.8859	0.1199	2.6516	3.1233
slope - Drug A	-0.0887	0.0117	-0.1115	-0.0651
slope - Drug C	-0.1057	0.0117	-0.1285	-0.0825
slope - Drug P	-0.0160	0.0119	-0.0390	0.00708
intercept A vs C	-0.2896	0.1690	-0.6066	0.0506
intercept A vs P	0.6310	0.1706	0.3002	0.9703
intercept C vs P	0.9206	0.1703	0.5897	1.2525
slope A vs C	0.0170	0.0166	-0.0166	0.0486
slope A vs P	-0.0727	0.0168	-0.1057	-0.0399
slope C vs P	-0.0897	0.0167	-0.1236	-0.0578

Output 11. Unequal slopes linear regression repeated measures model estimates and differences

You can see that the estimated intercepts for drug A and C and both greater than for the placebo (3.51 and 3.81 versus 2.89) indicating that initial breathing ability is greater for the two drugs than the placebo. On the other hand, the estimated slopes for drug A and C are and respectively versus for the placebo. The placebo’s HPD credible interval for slope includes zero, whereas the intervals for drugs A and C do not. The placebo’s regression is essentially flat because its initial effect is negligible.

Conclusion

In the SAS system hierarchy, PROC BGLIMM occupies a space between PROC GLIMMIX and PROC MCMC. You can use PROC BGLIMM for any GLMM that PROC GLIMMIX can implement. The advantage of the BGLIMM procedure is that it allows to access to Bayesian estimation and inference using the same CLASS, MODEL and RANDOM syntax as PROC GLIMMIX. The main difference between BGLIMM and GLIMMIX is that because Bayesian analysis depends on complex simulation algorithms, there are more diagnostics that must be monitored and more options that you may need to use in order to get a useable analysis.

PROC MCMC uses syntax borrowed from PROC NLMIXED. This means that it does not have a CLASS statement, making it more tedious to specify models, especially effects models with factorial treatment structures. On the other hand, unlike PROC BGLIMM, you can use PROC MCMC to fit nonlinear models, semiparametric models, and mixed models with non-Gaussian random effects, such as beta-binomial and Poisson-gamma models. PROC MCMC also provides more flexibility in specifying prior distributions. For example, you can use priors centered at the starting value for fixed effects, where BGLIMM limits you to “constant” or zero-centered Gaussian priors.

Finally, when you use PROC BGLIMM with non-Gaussian response variables, these is an additional post-processing step, either using the %SUMINT macro or the ODS OUTPUT data set, in order to get data scale estimates inferential statistics. The lack of an ILINK option means an extra step, but as we see in Examples 1 and 2, it is not a difficult step.

PROC BGLIMM makes Bayesian analysis highly accessible to data analysts who have PROC GLIMMIX experience but are new to the Bayesian world.

References

Beitler, P.J. and J.R. Landis. 1985. “A Mixed-Effects Model for Categorical Data.” Biometrics, 41:991-1000.

Christiansen, R., W. Johnson, A. Branscum and T.E. Hanson. 2011. Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians. Boca Raton, FL: CRC Press.

Ferrari, S. L. P., and Cribari-Neto, F. 2004. “Beta Regression for Modelling Rates and Proportions.” Journal of Applied Statistics 31:799–815

Littell, R.C, J. Pendergast and R. Natarajan. 2000. “Modeling Covariance Structure in the Analysis of Repeated Measures Data.” Statistics in Medicine, 19:1793-1819.

Littell, R.C., W.W. Stroup and R.J. Freund. 2002. SAS^® for Linear Models, 4^th edition. Cary, NC: SAS Institute, Inc.

Stroup, W.W. 2013. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. Boca Raton, FL: CRC Press.

Stroup, W.W., G.A. Milliken, E.A. Claassen and R.D. Wolfinger. 2018. SAS^® for Mixed Models: Introduction and Basic Applications. Cary, NC: SAS Institute Inc.

Acknowledgements

Acknowledge Oliver Schabenberger, who developed PROC GLIMMIX and was a valued partner in GLMM’s early days. Fang Chen, who developed PROC BGLIMM and who provided helpful advice about the procedure. George Milliken, Russ Wolfinger and Elizabeth Claassen, my SAS^® for Mixed Models co-authors and esteemed mixed model brain trust.