BookmarkSubscribeRSS Feed
supp
Pyrite | Level 9

Is there a way to use a bayesian approach to compare the means of two independent samples? Can proc mcmc handle something like this.

 

For example, I have two independent samples from the same underlying population:
Sample 1: n = 500, mean = 83.8, standard deviation = 24.7

Sample 2: n = 1000, mean = 86.1, standard deviation = 25.8

 

Is there a way I can model the difference between the means using proc mcmc without the underlying data? 

 

My only thought is to create a posterior distribution like this:

/* Compute the posterior distributions for the population means */
data post;
	do i = 1 to 10000;
		do x= 0 to 100 by .1;
			sample1  = pdf('Normal', x, 83.8, 24.7);
			sample2  = pdf('Normal', x, 86.1, 25.8);
		output;
		end;
	end;
run;

@Rick_SAS 

 

4 REPLIES 4
Rick_SAS
SAS Super FREQ

Yes. See this StackExchange article for references and a discussion that shows how to define the likelihood and priors: "Bayesian equivalent of two sample t-test." The article uses R and someone reproduced it in Python.

 

Ksharp
Super User

You could try BAYES statement of PROC GENMOD .

Or calling @StatDave 

 

proc genmod data=sashelp.heart;
class status;
model weight = status / dist=normal link=identity;
lsmeans status/diff cl;
bayes seed=1 coeffprior=normal;
run;
supp
Pyrite | Level 9

To figure this out here is simulated data. The outcome is score. We will try to estimate the coefficient paramater value for group using a linear regression

 

/**
Simulate data
**/
data _sim1;
	array test [4] _temporary_ (.03184 .08917 .19745 .68152);
	array control [4] _temporary_ (.02376 .10239 .20877 .66508);

		do i= 1 to 500;
			test_dist= rand('tabled', of test[*]);
			group = 1;
				if test_dist= 1 then score= 0;
				if test_dist= 2 then score= 33.33;
				if test_dist= 3 then score= 66.67;
				if test_dist= 4 then score= 100;
				output;
		end;

		do i= 1 to 10000;
			group = 2;
			control_dist= rand('tabled', of control[*]);
				if control_dist= 1 then score= 0;
				if control_dist= 2 then score= 33.33;
				if control_dist= 3 then score= 66.67;
				if control_dist= 4 then score= 100;
				output;
		end;
run;

Using proc genmode to esitmate the paramater values. Not sure if it is better to treat group as a class variable or just numeric.

proc genmod data= _sim1 ;
/* 	class group; */
	model score = group;
	bayes seed= 1 coeffprior=normal nbi= 1000 nmc= 100000 thin= 10 seed= 1
		out= posterior diagnostics=all;
run;

I think the diagnostics look good?

supp_0-1671807122314.png

 

The final analysis seems reasonable. Using HPD the impact on score of being in the control group is likely somewhere between -3.0138 and 1.55543 with a maximum liklihood point estimate of -0.7640.

supp_1-1671807411933.png

 

 

I tried to recreate something similar using PROC MCMC but it doesn't seem to be working as well. Can you see what I am doing wrong?

/**
MCMC approach
**/

proc mcmc data= _sim1 seed=1 nbi=1000 nmc=10000 outpost= simout thin=10;
	parms b0 0 b1 0 s2 1;
	prior b: ~ normal(0, var= 100);
	prior s2 ~ igamma(1, scale= 1);
	
	mu = b0 +( b1 * group);
	
	model score ~ normal(mu, var= s2);
run;

The effective sample size if very low and I seem to be getting almost a reversed impact of being in the test group.

supp_2-1671807586316.png

 

 

Ksharp
Super User
Sorry . I am not familiar with PROC MCMC. Maybe @StatDave know more things.
But if you want compare mean between groups, I think you need LSMEAN statement.

lsmeans status/diff cl;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1684 views
  • 2 likes
  • 3 in conversation