turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- PROC MCMC preddist thinning

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-27-2017 10:26 AM

The PREDDIST statement in PROC MCMC allows one to generate a dataset with predictive draws of the dependent variable. It is unclear to me exactly how these draws are collected from the MCMC chain - in particular, is the chain thinned in order to (hopefully) make the draws statistically independent in any way? The documentation doesn't appear to give any details about this, except that if one doesn't specify NSIM, it defaults to NMC, the number of iterations used in the chain (after burnin), in which case the draws would not be independent. If one specifies a value of NSIM<NMC, which iterations' draws are used?

Many thanks

Jonathan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jwb133

06-08-2017 07:23 AM

After discussing with SAS support, the following is my understanding of how preddist works. Suppose PROC MCMC is called using NMC=x, so x iterations of the MCMC sampler will be performed. Suppose that NSIM=y is specified, requesting y draws from the posterior predictive dsitribution of the outcome.

To produce the y draws from posterior predictive distribution, PROC MCMC samples y parameter values, with replacement, from the NMC=x samples from the posterior distribution. For each, it then simulates a value of y from its distribution conditional on the drawn parameter value.

This approach would seem to be valid if the y parameter values which are found by drawing with replacement from the NMC=x iterations are i.i.d. However, it would seem that potentially this does not hold. Suppose for example that we (perhaps stupidly) choose NMC=10 and NSIM=100000. Then the drawn values of the outcome are being drawn conditional on one of 10 parameter values. These 10 parameter values are probably correlated to some extent, and moreover you then have many draws of the outcome variable being made conditional on the same parameter value. In this (perhaps contrived) scenario, the 100000 values would not be (I contend) valid draws from the posterior predictive distribution.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jwb133

07-27-2017 02:59 PM

I have a question.

If nmc=100000, thin=10 and outpred=outpred1, does this mean I will have a dataset named outpred1 contains 10000 observations?

Thx.

jwb133 wrote:After discussing with SAS support, the following is my understanding of how preddist works. Suppose PROC MCMC is called using NMC=x, so x iterations of the MCMC sampler will be performed. Suppose that NSIM=y is specified, requesting y draws from the posterior predictive dsitribution of the outcome.

To produce the y draws from posterior predictive distribution, PROC MCMC samples y parameter values, with replacement, from the NMC=x samples from the posterior distribution. For each, it then simulates a value of y from its distribution conditional on the drawn parameter value.

This approach would seem to be valid if the y parameter values which are found by drawing with replacement from the NMC=x iterations are i.i.d. However, it would seem that potentially this does not hold. Suppose for example that we (perhaps stupidly) choose NMC=10 and NSIM=100000. Then the drawn values of the outcome are being drawn conditional on one of 10 parameter values. These 10 parameter values are probably correlated to some extent, and moreover you then have many draws of the outcome variable being made conditional on the same parameter value. In this (perhaps contrived) scenario, the 100000 values would not be (I contend) valid draws from the posterior predictive distribution.