BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Teketo
Calcite | Level 5

Hello,

 

I am doing a Hierarchical Bayesian Analysis using the Proc MCMC procedure. I have got three level model; however, the Proc MCMC is not working for me. It is taking too long to finish even for the empty model. 

 

I started doing the analysis from the empty model and it takes more than 10 minutes to finish off. Here is a sample code I used:

 

Proc mcmc data = care seed = 10 nmc = 200000 nbi = 10000 thin = 2 outpost = xcare DIC;

Prams beta0 sig2 delta2;

Prior beta0 ~ normal (0, var = 1000);

prior sig2 ~ igamma (shape = 0.1, scale = 0.01);

prior delta2 ~ igamma (shape = 0.1, scale = 0.01);

mu = beta0;

random gamma ~ normal (0, var = sig2) subject = region;

random delta ~ normal (0, var = delta2) subject = clusterXregion; (clusters are nested within region)

p = logistic(mu + gamma + delta);

model use ~ binary(p);

run;

 

Moreover, when I include fixed effects and random slopes, the program stops.

 

I really appreciate your support in this regard.

 

With kind regards

Teketo

1 ACCEPTED SOLUTION

Accepted Solutions
SAS_Rob
SAS Employee
It is hard to say for sure without knowing more about the data and the levels of cluster and regions, but initially I would say that NMC=200000 is the likely culprit. Why have you set it so large?

View solution in original post

3 REPLIES 3
ballardw
Super User

First thing I see is that your NMC and NBI options are orders of magnitude greater than the default 1000. Did you try with the defaults? How long did that take.

 

Also from the documentation details on computational resources:

 

Computational Resources

It is impossible to estimate how long it will take for a general Markov chain to converge to its stationary distribution.It takes a skilled and thoughtful analysis of the chain to decide whether it has converged to the target distribution andwhether the chain is mixing rapidly enough. In some cases, you might be able to estimate how long a particular simulationmight take. The running time of a program that does not have RANDOMstatements is approximately linear to the following factors: the number of samples in the input data set, the number of simulations,the number of blocks in the program, and the speed of your computer. For an analysis that uses a data set of size nsamples, a simulation length of nsim, and a block design of nblocks, PROC MCMC evaluates the log-likelihood function the following number of times, excluding the tuning phase:

 

\[ {\mi{nsamples}} \times {\mi{nsim}} \times {\mi{nblocks}} \]

The faster your computer evaluates a single log-likelihood function, the faster this program runs. Suppose you have nsamples equal to 200, nsim equal to 55,000, and nblocks equal to 3. PROC MCMC evaluates the log-likelihood function approximately $3.3\times 10^7$ times. If your computer can evaluate the log likelihood for one observation $10^6$ times per second, this program takes approximately a half a minute to run. If you want to increase the number of simulationsfive-fold, the run time increases approximately five-fold.

 

 

Note that the above is without RANDOM statements. Each RANDOM statement adds one pass through the input data at each iteration. So how big is your data set?

 

 

 

 

 

Teketo
Calcite | Level 5

Hello,

 

I am doing a Hierarchical Bayesian Analysis using the Proc MCMC procedure. I have got three level model; however, the Proc MCMC is not working for me. It is taking too long to finish even for the empty model. 

 

I started doing the analysis from the empty model and it takes more than 10 minutes to finish off. Here is a sample code I used:

 

Proc mcmc data = care seed = 10 nmc = 200000 nbi = 10000 thin = 2 outpost = xcare DIC;

Prams beta0 sig2 delta2;

Prior beta0 ~ normal (0, var = 1000);

prior sig2 ~ igamma (shape = 0.1, scale = 0.01);

prior delta2 ~ igamma (shape = 0.1, scale = 0.01);

mu = beta0;

random gamma ~ normal (0, var = sig2) subject = region;

random delta ~ normal (0, var = delta2) subject = clusterXregion; (clusters are nested within region)

p = logistic(mu + gamma + delta);

model use ~ binary(p);

run;

 

Moreover, when I include fixed effects and random slopes, the program stops.

 

I really appreciate your support in this regard.

 

With kind regards

Teketo

 

 

SAS_Rob
SAS Employee
It is hard to say for sure without knowing more about the data and the levels of cluster and regions, but initially I would say that NMC=200000 is the likely culprit. Why have you set it so large?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1571 views
  • 1 like
  • 3 in conversation