PROC BGLIMM: The Smooth Transition to Bayesian Analysis

Many analysts are interested in taking models they currently have and transitioning them to the Bayesian realm. Most leap from their favorite classical analysis procedure directly to PROC MCMC, the general-purpose Bayesian procedure. PROC MCMC is a powerful procedure and capable of completing very complex analyses. The leap to PROC MCMC might be a challenge if one is not accustomed to coding their own link functions or dummy-coding categorical variables. In this blog, we will discuss another procedure that can ease this transition from classical to Bayesian, PROC BGLIMM.

PROC BGLIMM (Bayesian Generalized Linear Mixed Models) is a very direct way to move many of your classical analyses to Bayesian. Think of PROC BGLIMM as an alternative to GLIMMIX. For this reason, classical models that were performed in REG, GLM, GLMSELECT, GENMOD, MIXED, and GLIMMIX can be performed within PROC BGLIMM.

Let's begin with an example to illustrate my point. Suppose that a toy manufacturer was interested in comparing the breaking strength of three different adhesives. They randomly select seven toys off the assembly line. In this instance, we are wanting to perform inference across all toys and not just those in the study. This results in adhesive being a fixed effect and toy a random effect. The following PROC MIXED code would be the way to analyze this in a classical sense.

proc mixed data=sasuser.toy;
   class adhesive toy;
   model pressure=adhesive / solution ddfm=kr;
   random toy;
run;

If we were to move this model to a Bayesian approach using PROC MCMC, the complexity of the code will increase. Without a CLASS statement, we would either use arrays or create the design variables on our own for the analysis. In our example, we used an array. The priors selected for this example were non-informative due to the lack of any additional knowledge about our problem.

proc mcmc data=toy seed=27513 diag=all dic outpost=mixed
          propcov=quanew thin=25 nbi=5000 ntu=5000 nmc=500000
          plots(smooth)=all mchistory=brief stats=all;
   array beta[3];
   parms beta: 0;
   parms s2t 1;
   parms s2g 1;
   prior beta: ~ normal(0, var = 1e5);
   prior s2: ~ igamma(2.001, scale = 1.001);
   random gamma ~ normal(0,var=s2g) subject=toy
   monitor=(gamma) namesuffix=position;
   mu = beta[adhesivebeta] + gamma;
   model pressure ~ normal(mu, var = s2t);
   title "Bayesian Analysis of the Toy Data Set";
run;

Would PROC BGLIMM have kept the complexity of the code simple? Yes! PROC BGLIMM does have a CLASS statement that avoids the issue with arrays or generating our own design variables. By default, the priors for the analysis are the non-informative type. From looking at the code below, you likely see aspects of other familiar procedures. This is what makes PROC BGLIMM a smoother transition to Bayesian.

proc bglimm data=sasuser.toy seed=8675309;
   class adhesive toy;
   model pressure=adhesive / dist=normal link=identity;
   random int / sub=toy;
run;

Let's take a moment and discuss details about this procedure. There is a suite, 13 in total, of covariance structures that can be selected both on the G and R side for mixed models. Allowances are made for covariance heterogeneity modeling. Models can be compared using the DIC (deviance information criteria) statistics. This is like the AIC/BIC statistic in classical analysis.

Since BGLIMM is Bayesian generalized linear modeling, there are several response distributions that can be used: binomial, exponential, gamma, geometric, inverse gaussian, negative binomial, normal, poisson, and binary. There are also several link functions that can be selected: log, logit, probit, inverse, identity, loglog, complimentary loglog, and powerminus2.

PROC BGLIMM has built in priors for you to select across the different parameters used within your analysis. For the fixed effect parameter (betas), your prior options are flat/constant and normal. For the scale parameter, your prior options are inverse gamma, gamma, and improper. For the G-side covariance parameters, your options are inverse wishart, inverse gamma, uniform, halfcauchy, halfnormal, and siwishart. For the R-side covariance parameters, your options are inverse wishart and inverse gamma.

As you can see, you will have the ability to build many diverse types of models very quickly within the Bayesian structure using PROC BGLIMM. Do realize that if your model does venture outside the bounds of the options within PROC BGLIMM all is not lost. You are welcome to then move to PROC MCMC to model this more complex analysis.

Let's look at a few other examples moving models from your favorite classical procedures to Bayesian using BGLIMM.

In the class data set, the response variable weight is being regressed against the predictor variables height, age, and gender. In the BGLIMM code, the COEFFPRIOR statement is changing the default flat prior of the betas to a normal prior with a mean of zero and variance of one million. The random seed is set for replication across multiple runs of PROC BGLIMM.

*Simple Linear Regression with Class Variable (GLM);
proc glm data=sashelp.class;
   class gender;
   model Weight = Height Age gender;
run;

*Simple Linear Regression with Class Variable (BGLIMM);
proc bglimm data=sashelp.class seed=8675309;
   class sex;
   model Weight = Height Age Sex / dist=normal coeffprior=normal(variance=1e6);
run;

In the crab data set, the response variable satellites represent the count of male horseshoe crabs orbiting around the nest of a female horseshoe crab. Like GENMOD and GLIMMIX, the DIST= and LINK= options establish our generalized linear model type.

*Poisson Regression with Random Effects (GLIMMIX);
proc glimmix data=work.crab;
   class color spine site;
   model satellites = color spine weight width / dist=poi link=log solution;
   random int / subject=site;
run;

*Poisson Regression with Random Effects (BGLIMM);
proc bglimm data=work.crab seed=8675309 diag=all plots=all;
   class color spine site;
   model satellites = color spine weight width / dist=poisson link=log;
   random int / subject=site;
run;

In the heartrate data set, the average heartrate is being compared across different drug treatments. Each patient was measured hourly establishing a repeated measures analysis. Note that BGLIMM returns to the use of the REPEATED statement for R-side random effects.

*Normal Response with Repeated Measures (MIXED);
proc mixed data=work.heartrate;
   class drug hours;
   model heartrate = baseline drug drug*baseline / solution ddfm=kr2;
   repeated hours/ type=un subject=patient;
run;

*Normal Response with Repeated Measures (BGLIMM);
proc bglimm data=work.heartrate seed=8675309 diag=all plots=all;
   class drug hours patient;
   model heartrate = baseline drug drug*baseline / dist=normal;
   repeated hours/ type=un sub=patient;
run;

Give PROC BGLIMM a try with your classical models and see how you like it. If you want to move to PROC MCMC, check out this repository full of information or this Special Collection offered from SAS Publications.

PROC BGLIMM: The Smooth Transition to Bayesian Analysis

Free course: Data Literacy Essentials

Get Started