The PREDDIST statement in PROC MCMC allows one to generate a dataset with predictive draws of the dependent variable. It is unclear to me exactly how these draws are collected from the MCMC chain - in particular, is the chain thinned in order to (hopefully) make the draws statistically independent in any way? The documentation doesn't appear to give any details about this, except that if one doesn't specify NSIM, it defaults to NMC, the number of iterations used in the chain (after burnin), in which case the draws would not be independent. If one specifies a value of NSIM<NMC, which iterations' draws are used?
Many thanks
Jonathan
After discussing with SAS support, the following is my understanding of how preddist works. Suppose PROC MCMC is called using NMC=x, so x iterations of the MCMC sampler will be performed. Suppose that NSIM=y is specified, requesting y draws from the posterior predictive dsitribution of the outcome.
To produce the y draws from posterior predictive distribution, PROC MCMC samples y parameter values, with replacement, from the NMC=x samples from the posterior distribution. For each, it then simulates a value of y from its distribution conditional on the drawn parameter value.
This approach would seem to be valid if the y parameter values which are found by drawing with replacement from the NMC=x iterations are i.i.d. However, it would seem that potentially this does not hold. Suppose for example that we (perhaps stupidly) choose NMC=10 and NSIM=100000. Then the drawn values of the outcome are being drawn conditional on one of 10 parameter values. These 10 parameter values are probably correlated to some extent, and moreover you then have many draws of the outcome variable being made conditional on the same parameter value. In this (perhaps contrived) scenario, the 100000 values would not be (I contend) valid draws from the posterior predictive distribution.
I have a question.
If nmc=100000, thin=10 and outpred=outpred1, does this mean I will have a dataset named outpred1 contains 10000 observations?
Thx.
@jwb133 wrote:After discussing with SAS support, the following is my understanding of how preddist works. Suppose PROC MCMC is called using NMC=x, so x iterations of the MCMC sampler will be performed. Suppose that NSIM=y is specified, requesting y draws from the posterior predictive dsitribution of the outcome.
To produce the y draws from posterior predictive distribution, PROC MCMC samples y parameter values, with replacement, from the NMC=x samples from the posterior distribution. For each, it then simulates a value of y from its distribution conditional on the drawn parameter value.
This approach would seem to be valid if the y parameter values which are found by drawing with replacement from the NMC=x iterations are i.i.d. However, it would seem that potentially this does not hold. Suppose for example that we (perhaps stupidly) choose NMC=10 and NSIM=100000. Then the drawn values of the outcome are being drawn conditional on one of 10 parameter values. These 10 parameter values are probably correlated to some extent, and moreover you then have many draws of the outcome variable being made conditional on the same parameter value. In this (perhaps contrived) scenario, the 100000 values would not be (I contend) valid draws from the posterior predictive distribution.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.