BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
supp
Pyrite | Level 9

I am working through the book "Statistical Rethinking" by Richard McElreath. The following is a toy example from the book that uses grid approximation to get a posterior distribution then samples from the posterior. My question is what is an equivalent way to obtain the same posterior and  similar sample in SAS?

 

Since I am more familiar with SAS I am doing this for understanding and fun.

 

The set up is we toss a globe and catch it 9 times. The tip of our right index finger will land on either land (L) or water (W). We observe L= 3 and W= 6. We are trying to determine the proportion of the globe that is water using a bayesian approach.

 

The R code:

 

# create a grid of possible parameter values (proportion of globe that is W)
p_grid <- seq(from= 0, to= 1, length.out= 1000)
p_grid[1000]

# Set prior
prob_p <- rep(1, l)

# Liklihood function for grid of possible parameter values
prob_data <- dbinom(6, size = 9, p_grid)

# combine prior with liklihood to get posterior
posterior <- prob_data * prob_p
sum(posterior)

#Standard posterior
posterior <- posterior/sum(posterior)

# Sample from posterior
samples <- sample(p_grid, prob = posterior, size = 1e4, replace = TRUE)

#plot samples
plot(samples)
plot(p_grid, posterior)

 

 

Here is what the resulting posterior distribution looks like from the R script.

supp_0-1656247249542.png

 

 

 

Here is my attempt to recreate the process in SAS. The tricky part seems to be is how to sample (or run simulations) based on the posterior. I used PROC SURVEYSELECT.

 

/**
Bayes practice
**/

data _test1;
	p_grid= 0;											/** Set initial parameter value **/
	
	do i= 1 to 1000;
		
/*		p_grid = rand('uniform') ;	*/					/** Alternative approach is to just randomly generate a bunch of parameter values **/
		prob_p = 1 ; 									/** Set a prior **/
		prob_data = pdf('binomial', 6, p_grid, 9); 		/** Liklihood function, liklihood of data given parameter value **/
		posterior = prob_data * prob_p;
		output;
		p_grid + .001;									/** update p_grid **/
	end;

drop i; run; proc sql; select sum(posterior) into: sum_posterior from _test1; /* Sum posterior to standardize */ create table _test2 as select *, posterior/(&sum_posterior.) as posterior_s from _test1 order by p_grid; quit; # create macro to sample from dataset using standard posterior as the size. %macro sample(i); proc surveyselect data= _test2 method= pps_wr out= _sample1 n= 1 outhits noprint; size posterior_s; run; proc append base= all_simulations data= _sample1; run; %mend sample; proc delete data= all_simulations; run; data _null_; %let sim_size = 10000; do i = 1 to &sim_size; call execute("%sample("||i||");"); end; run; proc sql; create table _sim_sum1 as select p_grid, sum(numberhits) as freq, sum(numberhits) / &sim_size. as proportion from all_simulations group by p_grid; select sum(proportion) as sum_proportion from _sim_sum1; quit; proc sgplot data= _sim_sum1; series x= p_grid y= proportion; run;

Here is the resulting distribution from the plot: (It doesn't look right)

supp_1-1656247750962.png

 

I acknowledge I may be way off base with this approach. I am trying to understand this material and would appreciate someone showing me a proper way to do this in SAS.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Here are some sas base code if you don't have iml module.

 

%let size=1000;
data have;
call streaminit(123);
 do p_grid=0 to 1 by 1/(&Size.-1);
   prob_data=pdf('binomial', 6, p_grid,9);
   output;
 end;
run;
proc sql;
create table have2 as
 select *,prob_data/sum(prob_data) as posterior
  from have;
quit;
proc sgplot data=have2;
series x=p_grid y=posterior;
run;


proc surveyselect data=have2 out=want seed=123 sampsize=10000 method=pps_wr outhits;
size posterior;
run;
proc sgplot data=want;
histogram p_grid;
run;

View solution in original post

9 REPLIES 9
Ksharp
Super User

In SAS, if you want to do some data simulation by bayesian method, you could try PROC MCMC .

Or try to use SAS/IML , @Rick_SAS  could help you something .

supp
Pyrite | Level 9
I looked into PROC MCMC. I can't figure out what the MODEL statement would be. Maybe this is too simple of an example?
Ksharp
Super User

You want this ?

 

%let size=1000;
proc iml;
call randseed(123);
p_grid=randfun(&size.,'uniform');
prob_data=pdf('binomial', 6,p_grid,9);

prob_p=j(&size.,1);
posterior= prob_data # prob_p;
posterior=posterior/sum(posterior);

samples = sample(p_grid, 1e4,'replace',posterior) ;

create want var{ p_grid posterior };
append;
close;
quit;

proc sort data=want; by p_grid;run;
proc sgplot data=want;
 series x=p_grid y=posterior ;
run;

Ksharp_0-1656324449326.png

 

Rick_SAS
SAS Super FREQ

To mimic the R script, use KSharp's idea but use

p_grid=T( do(0,1, 1/(&Size.-1)) );

With this change, you can skip the PROC SORT step.  I also inserted a histogram for the sample:

%let size=1000;
proc iml;
call randseed(123);
p_grid=T( do(0,1, 1/(&Size.-1)) ); 
prob_data=pdf('binomial', 6, p_grid,9);

prob_p=j(&size.,1);
posterior= prob_data # prob_p;
posterior=posterior/sum(posterior);
call series (p_grid, posterior);

samples = sample(p_grid, 1e4,'replace',posterior) ;
call histogram(samples);
supp
Pyrite | Level 9
Thanks for the reply. When I try running PROC IML I get a message suggesting my shop does not have a license for it. It seems PROC IML lets us approach the problem very similar to R using vectors.
Rick_SAS
SAS Super FREQ

Yes. The IML language is similar to R in many ways. Both are high-level languages that support matrix-vector computations, a large library of computational routines, and enable you to extend the language by defining your own functions.  As this example shows, you can often translate an R program to PROC IML and vice versa.

 

The IML language is also syntactically similar to MATLAB.

supp
Pyrite | Level 9

I spent some time summarizing the results from the SAS method. I now think the approach I used and results are valid. I still think there is a more elegant way to achieve the same results. 

 

Here are some summaries I made from sampling the posterior. 


/** Add a cumulative proportion column to find confidence intervals **/
data _sample3;
	set _sample2;
	
		retain cumulative_prop;
		if _n_ = 1 then cumulative_prop = proportion;
		else cumulative_prop = cumulative_prop + proportion;
run;



proc sgplot data= _sample3;
	series x= p_grid y= proportion;
run;


proc sql;
/** Add up posterior probability where p < 0.5 */
	select sum(posterior_s) as sum_standard_post from _sample3 where p_grid < 0.5;
	select sum(proportion) as sum_proportion from _sample3 where p_grid < 0.5;

/** What proportion of water is compatible with greater than 0.5 and less than .75 probability? **/
	select sum(proportion) as sum_proportion from _sample3 where 0.5 < p_grid < 0.75;

/** What proportion of water is compatible with greater less than 80% probability? **/
	select max(p_grid) as p_grid from _sample3 where cumulative_prop < .8;

/** What proportion of water is compatible with middle 80% probability? **/
	select min(p_grid) as lower_parameter, max(p_grid) as upper_parameter from _sample3 where 0.1 < cumulative_prop < .9;

/** Point estimate of highest probabilty  **/
	select p_grid as point_estimate, proportion, cumulative_prop from _sample3 having max(proportion) = proportion;

quit;
/**
Proportion of water less than .5
	Posterior --> 0.166 probability
	Sample --> 0.178 probability

Proportion of water greater than .5 and less than .75
	sample --> .596 probability

Lower 80% posterior probablity when proprotion of water is .76

Middle 80% posterior probablity when proprotion of water is between .445 and .812

point estimate for most likely proportion of water --> .625 (about .4% probability)
**/
Ksharp
Super User

Here are some sas base code if you don't have iml module.

 

%let size=1000;
data have;
call streaminit(123);
 do p_grid=0 to 1 by 1/(&Size.-1);
   prob_data=pdf('binomial', 6, p_grid,9);
   output;
 end;
run;
proc sql;
create table have2 as
 select *,prob_data/sum(prob_data) as posterior
  from have;
quit;
proc sgplot data=have2;
series x=p_grid y=posterior;
run;


proc surveyselect data=have2 out=want seed=123 sampsize=10000 method=pps_wr outhits;
size posterior;
run;
proc sgplot data=want;
histogram p_grid;
run;
supp
Pyrite | Level 9
Thanks @Ksharp, the code you provided gets a posterior distribution without using PROC IML. It is also a little cleaner than my code in the original post.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1001 views
  • 9 likes
  • 3 in conversation