BookmarkSubscribeRSS Feed
supp
Pyrite | Level 9

Is there a way to use a bayesian approach to compare the means of two independent samples? Can proc mcmc handle something like this.

 

For example, I have two independent samples from the same underlying population:
Sample 1: n = 500, mean = 83.8, standard deviation = 24.7

Sample 2: n = 1000, mean = 86.1, standard deviation = 25.8

 

Is there a way I can model the difference between the means using proc mcmc without the underlying data? 

 

My only thought is to create a posterior distribution like this:

/* Compute the posterior distributions for the population means */
data post;
	do i = 1 to 10000;
		do x= 0 to 100 by .1;
			sample1  = pdf('Normal', x, 83.8, 24.7);
			sample2  = pdf('Normal', x, 86.1, 25.8);
		output;
		end;
	end;
run;

@Rick_SAS 

 

4 REPLIES 4
Rick_SAS
SAS Super FREQ

Yes. See this StackExchange article for references and a discussion that shows how to define the likelihood and priors: "Bayesian equivalent of two sample t-test." The article uses R and someone reproduced it in Python.

 

Ksharp
Super User

You could try BAYES statement of PROC GENMOD .

Or calling @StatDave 

 

proc genmod data=sashelp.heart;
class status;
model weight = status / dist=normal link=identity;
lsmeans status/diff cl;
bayes seed=1 coeffprior=normal;
run;
supp
Pyrite | Level 9

To figure this out here is simulated data. The outcome is score. We will try to estimate the coefficient paramater value for group using a linear regression

 

/**
Simulate data
**/
data _sim1;
	array test [4] _temporary_ (.03184 .08917 .19745 .68152);
	array control [4] _temporary_ (.02376 .10239 .20877 .66508);

		do i= 1 to 500;
			test_dist= rand('tabled', of test[*]);
			group = 1;
				if test_dist= 1 then score= 0;
				if test_dist= 2 then score= 33.33;
				if test_dist= 3 then score= 66.67;
				if test_dist= 4 then score= 100;
				output;
		end;

		do i= 1 to 10000;
			group = 2;
			control_dist= rand('tabled', of control[*]);
				if control_dist= 1 then score= 0;
				if control_dist= 2 then score= 33.33;
				if control_dist= 3 then score= 66.67;
				if control_dist= 4 then score= 100;
				output;
		end;
run;

Using proc genmode to esitmate the paramater values. Not sure if it is better to treat group as a class variable or just numeric.

proc genmod data= _sim1 ;
/* 	class group; */
	model score = group;
	bayes seed= 1 coeffprior=normal nbi= 1000 nmc= 100000 thin= 10 seed= 1
		out= posterior diagnostics=all;
run;

I think the diagnostics look good?

supp_0-1671807122314.png

 

The final analysis seems reasonable. Using HPD the impact on score of being in the control group is likely somewhere between -3.0138 and 1.55543 with a maximum liklihood point estimate of -0.7640.

supp_1-1671807411933.png

 

 

I tried to recreate something similar using PROC MCMC but it doesn't seem to be working as well. Can you see what I am doing wrong?

/**
MCMC approach
**/

proc mcmc data= _sim1 seed=1 nbi=1000 nmc=10000 outpost= simout thin=10;
	parms b0 0 b1 0 s2 1;
	prior b: ~ normal(0, var= 100);
	prior s2 ~ igamma(1, scale= 1);
	
	mu = b0 +( b1 * group);
	
	model score ~ normal(mu, var= s2);
run;

The effective sample size if very low and I seem to be getting almost a reversed impact of being in the test group.

supp_2-1671807586316.png

 

 

Ksharp
Super User
Sorry . I am not familiar with PROC MCMC. Maybe @StatDave know more things.
But if you want compare mean between groups, I think you need LSMEAN statement.

lsmeans status/diff cl;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1112 views
  • 2 likes
  • 3 in conversation