- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is there a way to use a bayesian approach to compare the means of two independent samples? Can proc mcmc handle something like this.
For example, I have two independent samples from the same underlying population:
Sample 1: n = 500, mean = 83.8, standard deviation = 24.7
Sample 2: n = 1000, mean = 86.1, standard deviation = 25.8
Is there a way I can model the difference between the means using proc mcmc without the underlying data?
My only thought is to create a posterior distribution like this:
/* Compute the posterior distributions for the population means */
data post;
do i = 1 to 10000;
do x= 0 to 100 by .1;
sample1 = pdf('Normal', x, 83.8, 24.7);
sample2 = pdf('Normal', x, 86.1, 25.8);
output;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. See this StackExchange article for references and a discussion that shows how to define the likelihood and priors: "Bayesian equivalent of two sample t-test." The article uses R and someone reproduced it in Python.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could try BAYES statement of PROC GENMOD .
Or calling @StatDave
proc genmod data=sashelp.heart; class status; model weight = status / dist=normal link=identity; lsmeans status/diff cl; bayes seed=1 coeffprior=normal; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To figure this out here is simulated data. The outcome is score. We will try to estimate the coefficient paramater value for group using a linear regression
/**
Simulate data
**/
data _sim1;
array test [4] _temporary_ (.03184 .08917 .19745 .68152);
array control [4] _temporary_ (.02376 .10239 .20877 .66508);
do i= 1 to 500;
test_dist= rand('tabled', of test[*]);
group = 1;
if test_dist= 1 then score= 0;
if test_dist= 2 then score= 33.33;
if test_dist= 3 then score= 66.67;
if test_dist= 4 then score= 100;
output;
end;
do i= 1 to 10000;
group = 2;
control_dist= rand('tabled', of control[*]);
if control_dist= 1 then score= 0;
if control_dist= 2 then score= 33.33;
if control_dist= 3 then score= 66.67;
if control_dist= 4 then score= 100;
output;
end;
run;
Using proc genmode to esitmate the paramater values. Not sure if it is better to treat group as a class variable or just numeric.
proc genmod data= _sim1 ;
/* class group; */
model score = group;
bayes seed= 1 coeffprior=normal nbi= 1000 nmc= 100000 thin= 10 seed= 1
out= posterior diagnostics=all;
run;
I think the diagnostics look good?
The final analysis seems reasonable. Using HPD the impact on score of being in the control group is likely somewhere between -3.0138 and 1.55543 with a maximum liklihood point estimate of -0.7640.
I tried to recreate something similar using PROC MCMC but it doesn't seem to be working as well. Can you see what I am doing wrong?
/**
MCMC approach
**/
proc mcmc data= _sim1 seed=1 nbi=1000 nmc=10000 outpost= simout thin=10;
parms b0 0 b1 0 s2 1;
prior b: ~ normal(0, var= 100);
prior s2 ~ igamma(1, scale= 1);
mu = b0 +( b1 * group);
model score ~ normal(mu, var= s2);
run;
The effective sample size if very low and I seem to be getting almost a reversed impact of being in the test group.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But if you want compare mean between groups, I think you need LSMEAN statement.
lsmeans status/diff cl;