You have explored your data and generated your model using a Bayesian perspective. It is now time to use this model to score new observations. What are your options for Bayesian scoring? This blog will address two ways that you can perform Bayesian scoring on your new data.
Scoring data is nothing new to a statistical analyst. From code statements to score statements, we have generated predictions from our classical models for some time. What makes Bayesian scoring different?
In the classical (frequentist) world, your resulting model yields a single parameter estimate, beta-hat, for each term in your model. With these estimated, we take our new observation and "plug and chug" out our single prediction, y-hat, for each of the new observations. Recall that Bayesian statistics yield posterior distributions rather than single estimates. During the MCMC algorithm, we save iterations that represent a sampling of the posterior distributions of each of our parameters. So, we now have more than just one beta-hat for each parameter in our problem. Each saved iteration is a realization of what we believe the values of the parameters are at one moment or iteration.
To score a new observation, in the Bayesian perspective, we take each observation, one at a time, and compute a y-hat prediction for each saved iteration (indicated by the subscript in the previous equation). This essentially creates a list of predictions for each new observation. This list is a sample of the posterior predictive distribution for that new observation.
From this posterior sample, we compute posterior summary statistics and intervals that can be used to answer questions as needed.
So, how do we score our Bayesian models within SAS? One way is to use the PREDDIST statement within the MCMC procedure. Do not add the PREDDIST statement to the procedure until you have confirmed that you have a converged chain. Doing so will potentially waste time as the scoring performed using a non-converged chain will be useless. Within this statement, you provide the name of a data set that contains your new observations to score. Please be sure that the variable names in the new data set match those in the training data set. Options allow you to control what posterior summary statistics will be displayed to the screen. It is in your best interest to save the scored chain to a SAS data set for any other use.
In this example, we have an already converged chain that we wish to score. The PREDDIST statement is bringing in the observations to score from the new_birth dataset using the covariates option. The scored information will be stored in the dataset scored. The posterior summary results will be stored in scored_summaries.
ods select none;
proc mcmc data=sasuser.birth plots=none seed=27513 outpost=birthout1
propcov=quanew nbi=5000 ntu=5000 nmc=400000 thin=10;
parms (beta0 beta1 beta2 beta3 beta4) 0;
prior beta1 ~ normal(1.0986,var=0.00116);
prior beta0 beta2 beta3 beta4 ~ normal(0, var=100);
p = logistic(beta0 + beta1*alcohol + beta2*hist_hyp +
beta3*mother_wt + beta4*prev_pretrm);
model low ~ binary(p);
preddist outpred=scored covariates=new_birth stats(percent=50)=summary;
ods output predsummaries=scored_summaries;
run;
ods select all;
proc print data=scored_summaries;
title "Posterior Summaries of Scored Observations";
run;
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
The predictions returned, in our example, are 0s and 1s reflective of the fact that the response variable low was noted as binary in the model statement. PREDDIST calculated the predicted logits on each iteration, for each new observation, proceeded to perform the inverse link back to a probability, and based on comparison on a random uniform draw declares each 0/1 prediction. The posterior means you see in the provided output are the means of these 0/1 predictions, thus the proportion of 1s for each new observation.
PREDDIST does have a few limitations. The number of new observations to be scored is capped at 50,000. Secondly, if you use the GENERAL or DGENERAL function, PREDDIST will be ignored. What alternative could you use if this situation presented itself? You could use PROC IML.
To use PROC IML for Bayesian scoring, I will assume that you have saved the converged chain from the PROC MCMC run (birthout1) and that you have a new data set containing your observations to be scored (new_birth).
After starting PROC IML, we begin by calling in the saved chain and new observations into two matrices, parms and newobs respectively. Listing the names of the columns in the braces ({ }) restricts the matrix to just the columns that we need and also sets their order. It is important to make sure that the columns for the parameters and the columns of the new observations are in matching orders within the matrices.
proc iml;
*putting saved chain into a matrix;
use birthout1;
read all var {beta0 beta1 beta2 beta3 beta4} into parms;
close birthout1;
*putting new observations into a matrix with variables in correct order;
use new_birth;
read all var {alcohol hist_hyp mother_wt prev_pretrm} into newobs;
close new_birth;
Since we included an intercept, we will generate a column of 1s and append this to the front of the new observations’ matrix, newobs.
*creating and appending column of 1s for the intercept in the model;
intercept= j(nrow(newobs),1,1);
newobs = intercept || newobs;
Next, we multiply the parameters matrix against the transpose of the observation matrix. The transpose ensures the correct dimension match. Note that this matrix multiplication is doing exactly what we want. Each new observation is being applied against each saved iteration and yielding a list of predictions, yhat.
*matrix multiply to make the logits for each new observation for each iteration;
yhat = parms*newobs`;
We now have a list, posterior sample, of predicted logits for each of the new observations. To get back to predicted probabilities, we will undo the logit transformation. First, we will create a matrix e, the same dimension as the yhat matrix, filled with the value of the mathematical constant e. Then, use this matrix e to exponentiate each logit value within the matrix yhat.
*make a matrix of value e and then exponentiate each logit value;
e = j(nrow(yhat),ncol(yhat),constant("e"));
eta = e ## (yhat);
Next, we take these exponentiated versions of the logits and process the inverse link of the logit transformation.
*undo the logit transformation to return to predicted probability;
prob = eta / (eta+1);
The resulting matrix, prob, now contains predicted probabilities for each new observation. Remember that we have as many predicted probabilities for each new observation as there were saved iterations from the saved chain. We then can use this to generate any posterior summary statistics we wish. In my example, I generate the posterior mean of each posterior predictive distribution.
*find posterior mean of probability across all saved iterations;
pred = prob[:,];
print pred;
quit;
It is noted that we could continue with the final step and declare actual 0/1 predictions based on these probabilities and then calculate the proportion of 1s thus like PREDDIST. In our example, we stopped at the point of the predicted probability and focused on the posterior mean of that distribution.
Do note that this inverse link step was needed due to the analysis being a logistic regression. If you were performing any other analysis, you would have to correctly undo the link function that was used. In the case of the link being identity, no additional inverse link work would be needed.
Also, this generated a posterior predictive distribution for the average response. If you were wanting to create the posterior for an individual prediction, say in the case of linear regression, there would be an additional step needed to generate the error element to add to the prediction. To do this, you would recall that the variance parameter was also saved in the chain. You would, at each iteration, draw a value from a normal distribution with a zero mean and a variance equal to the value of the variance component at that iteration. Repeat this to get a vector of errors. These random errors would then be added to the prediction of the observation at each respective iteration.
For additional information about the PREDDIST statement within PROC MCMC, check out this link to the documentation.
For more information about PROC IML, check out this link to the documentation and also The Do Loop blog written by Rick Wicklin.
... View more