BookmarkSubscribeRSS Feed
H
Pyrite | Level 9 H
Pyrite | Level 9

Hello,

 

I have built the below model in PROC GENMOD and would like to rewrite it using PROC MCMC. I wanted to play around with the MCMC procedure and some of its additional options (e.g., Hamiltonian, etc.), but first I wanted to make sure I rewrote my prior model correctly.

 

Current model:

 

 

data normalPrior_sen;
	input _type_ $ Intercept Teaching_Service0;
	datalines;
	Var  0.5 0.5
	Mean 0.0 0.15
	;
run;
ods graphics on;
title "Bayes with skeptical prior PP";
proc genmod data = post_fin descending ;
	where 	discord GE 0 and discord LT 1;
        class Teaching_service;
        model Trans_ppi = Teaching_service / dist=binomial link = log;
        estimate 'Beta' Teaching_service 1 -1/ exp;
        lsmeans Teaching_service / e exp diff cl; 
	bayes 
	seed=10231995 
	nbi=50000 
	nmc=50000  
	thinning=5 
	diagnostics=all    
	statistics=all
	coeffprior=normal(input=NormalPrior_sen)
	outpost=post_PP_W;
	store logit_bayes_PP_W;
run;

Of note, notice the attempt to use "dist=binomial" and "link = log" to get at the rate difference estimates in the GENMOD code. My attempt to rewrite the code is below. I would also be interested in how to set the reference group for the covariate within PROC MCMC, so switching the binary variable Teaching_service 's reference group to 1 instead of 0.

 

 


ods graphics on;
PROC MCMC DATA=post_fin 
	seed=10231995
	outpost=logit_bayes_sen_mcmc
	nbi=50000
	nmc=50000
	nthin=5
	DIC
	statistics=all
	diagnostics=all;
	where 	discord GE 0 and discord LT 1;
	parms (beta0 beta1);
	prior beta0 ~ normal(0, var=0.5);
	prior beta1  ~ normal(0.15, var=0.5); 
/*	prior beta0 ~ normal(0, var=1000);
	prior beta1  ~ normal(0, var=1000); */
	p = logistic(beta0 + beta1*Teaching_Service);
	model Trans_ppi ~ binomial(1,p);  /*n=number of parameters from input dataset, (e.g., 1)*/
	ods select PostSummaries PostIntervals;
run;
ods graphics off;

 

 

Thank you in advance for your time and help.

9 REPLIES 9
Garnett
Obsidian | Level 7

Hi there,

You are using a log link function in the GENMOD syntax and a logistic link function in the MCMC syntax. It isn't clear what priors you are using in GENMOD, but b/c the link functions are different, priors on the parameters can't be identical if you want to get the same result.

 

One HUGE benefit of MCMC in Bayesian analysis is that it's trivial to make inferences about any function of the parameter. If you want the posterior on the difference in probabilities between "teaching_service" = 1 and "teaching_service" = -1, then add the following to your MCMC syntax:

 

Rate_Difference = logistic(beta0 + beta1) - logistic(beta0 - beta1);

 

Also, somewhere in the MCMC statement add "monitor=(_PARMS_ Rate_Difference)" and that'll get you what you want.

 

Good luck!

Garnett

 

H
Pyrite | Level 9 H
Pyrite | Level 9

Garnett,

 

Thanks for the reply. I am away from my computer, but thinking about you response. Can you think of a way to add the log link, or were you implying the rate difference would get at this. I am guessing not the latter. 

 

Side comment, the values for priors in the GENMOD were defined in the data statement prior to the procedure and referenced in GENMOD along with NORMAL, so in my mind I'm referencing the same va!ues in both procedures

 

Thanks again, and I would appreciate any feedback,

 

H

Garnett
Obsidian | Level 7

Hello,

I missed the datastep specification of the priors in GENMOD. sorry about that. In any event, the prior information about the event probabilities are different because the link functions are different. As long as you use the logistic link in PROC MCMC and log link in GENMOD, you can expect different inferences about the regression coefficients.

 

I wouldn't use a log link for probabilities. If you want to stick with it, just replace the "logistic" function in MCMC with "exp". You'll get errors as this link function allows binomial probabilities outside (0,1).  Personally, I wouldn't bother with that approach, and just stick with the logit link. Then you can compute the rate difference within the MCMC iterations.

 

Garnett

H
Pyrite | Level 9 H
Pyrite | Level 9

@Garnett 

 

Thanks for the feedback. I buried myself in work and I now have time to revisit this question.  Below I incorporated your suggests into a comparable project model. I am trying to get rate differences as mentioned before. However, I am unsure how to create my priors. I imagine the intercept (beta0) has a rate of 0.05 with variance of 0.025 and the beta1 has a rate of 0.80 with variance 0.025. I put these into the model below and they seem wrong. I am basing this on shrinking variance down to say 0.000000001 for the priors and the estimated rate difference is around 0.42. In my mind, shrinking the variance down that much should regularize the rate difference down to exact the difference of priors (e.g., 0.75), but this doesn't happen. I played around blindly with beta priors to not avail. Any help you or others can provide would be appreciated.

ods graphics on;
PROC MCMC DATA=dataset
	seed=10231995
	outpost=logit_bayes_sen_mcmc
	 monitor=(Rate_Difference)
	nbi=50000
	nmc=50000
	nthin=5
	DIC
	statistics=all
	diagnostics=all;
	parms (beta0 beta1);
	prior beta0 ~ normal(0.10, var=0.025);
	prior beta1  ~ normal(0.80, var=0.025); 
	Rate_Difference = logistic(beta0 + beta1*age_bin) - logistic(beta0 - beta1*age_bin);
	model Y ~ binomial(1,Rate_Difference);  /*n=number of parameters from input dataset, (e.g., 1)*/
	ods select PostSummaries PostIntervals;
run;
ods graphics off;

 

 

  

Garnett
Obsidian | Level 7

I don't think you want "Rate_difference" as the binomial parameter. In theory, it could be negative.

Instead, replace the model code (from "Parms" to "Model") with

parms (beta0 beta1);
	prior beta0 ~ normal(0.10, var=0.025);
	prior beta1  ~ normal(0.80, var=0.025); 
	*Rate_Difference = logistic(beta0 + beta1*age_bin) - logistic(beta0 - beta1*age_bin);
prob=logistic(beta0 + beta1*age_bin); model Y ~ binomial(1,prob); /*n=number of parameters from input dataset, (e.g., 1)*/

       Rate_Difference=logistic(beta0 + beta1*1) - logistic(beta0 + beta1*-1);

This assumes that "Age_bin" is a binary indicator taking values of 1 and -1.

H
Pyrite | Level 9 H
Pyrite | Level 9

@Garnett 

 

Thank you once again for providing feed back. I am actually looking for rate differences. I am in the medical field and we traditionally have lots of binary outcomes (e.g., death, adverse event, reoccurrence, etc.). Historically we used logistic regression all off the time because that is what was available to us when performing regression models. However, odds, log odds, odds ratios are difficult for the normal clinician to understand. So we always strive to use relative rates (AKA, relative risks) or even more interpretable are rate differences. I used the Proc Genmod to get at these rate differences. I was then able to use priors given the BAYES option in that procedure. https://academic.oup.com/aje/article/162/3/199/171116

 

However, I wanted to try this in PROC MCMC, so I can use the additional options available in the procedure. Side note: rate differences can be negative, though rates would be bounded within 0-1..

 

But I am getting confused which priors to use. In my example, I believe the reference group has a rate of 10% and non-reference group a rate of 90% (or 0.8 higher than the reference (base rate)). If I take LN(0.1 +0.8) = 2.2 and if I use expit I get exp(LN Odds) / 1 + exp(LN odds) = prob = 0.9, which seems great. But when plugging these priors into PROC MCMC with infinitesimally small variance, I am getting a rate_difference  not equal to 0.90. Which as I noted, I felt should be the estimate 0.9 given the very small variance value.

 

I will try to create a toy example to work from.

H
Pyrite | Level 9 H
Pyrite | Level 9

@Garnett 

 

EDITED:

 

Below you can see in this Toy example that the Risk Difference is 0.156 in the 2x2 table, GENMOD, and GENMOD with flat priors. You can also see that with very small variance values for priors, the estimate converges to the prior values. Any feedback would be of interest in trying to get this to also execute in PROC MCMC.

DATA Hearty;
	set Sashelp.heart;
	IF Sex = 'Female' THEN Sex_bin = 0;
	IF Sex = 'Male' THEN Sex_bin = 1;
	IF Status = "Dead" THEN Y = 1;
	IF Status = "Alive" THEN Y = 0;
	KEEP Sex_bin Y;
RUN;
PROC SORT DATA=Hearty;
	BY DESCENDING Sex_bin DESCENDING Y;
RUN;
PROC FREQ DATA=Hearty ORDER=DATA;
	TABLE Sex_bin*Y / RISKDIFF;
RUN;

/*Risk Model*/
PROC GENMOD DATA = hearty DESCENDING;
	MODEL Y = Sex_bin / DIST=bin LINK = identity LRCI;
	ESTIMATE 'RD: Sex_bin = 1' Sex_bin 1 -1;
RUN;

/*With Priors*/
DATA normalPrior_Informed;
	INPUT _type_ $ Intercept sex_bin1;
	DATALINES;
	Var  0.025 0.025
	Mean 0.10 0.80
	;
RUN;
ODS GRAPHICS ON;
TITLE "Bayes with Informative Priors";
PROC GENMOD DATA = Hearty DESCENDING;
 	CLASS Sex_bin (REF='0');
 	MODEL Y = Sex_bin / DIST=BIN LINK = IDENTITY;
	ESTIMATE 'Beta' Sex_bin -1 1;
 	LSMEANS Sex_bin / DIFF CL; 
	BAYES 
	SEED=10231995
	stats(PERCENT=2.5 50 97.5) 
	NBI=500 
	NMC=500  
	THINNING=5 
	DIAGNOSTICS=ALL    
	STATISTICS=ALL
	COEFFPRIOR=NORMAL(INPUT=NormalPrior_Informed)
	OUTPOST=post_OD;
	STORE logit_bayes_Hearty;
RUN;
ODS GRAPHICS OFF;


/*Flat Priors*/
DATA normalPrior_Flat;
	INPUT _type_ $ Intercept sex_bin1;
	DATALINES;
	Var  1000 1000
	Mean 0.10 0.80
	;
RUN;
ODS GRAPHICS ON;
TITLE "Bayes with Informative Priors";
PROC GENMOD DATA = Hearty descending;
 	CLASS Sex_bin (REF='0');
 	MODEL Y = Sex_bin / DIST=BIN LINK = IDENTITY;
	ESTIMATE 'Beta' Sex_bin -1 1;
 	LSMEANS Sex_bin / DIFF CL; 
	CONTRAST 'Sex_bin' Sex_bin -1 1;
	BAYES 
	SEED=10231995
	stats(PERCENT=2.5 50 97.5) 
	NBI=500 
	NMC=500  
	THINNING=5 
	DIAGNOSTICS=ALL    
	STATISTICS=ALL
	COEFFPRIOR=NORMAL(INPUT=NormalPrior_Flat)
	OUTPOST=post_OD;
	STORE logit_bayes_Hearty;
RUN;
ODS GRAPHICS OFF;


/*Same thing with very small variance Priors*/
DATA normalPrior_OverInformed;
	INPUT _type_ $ Intercept sex_bin1;
	DATALINES;
	Var  0.0000025 0.00000025
	Mean 0.10 0.80
	;
RUN;
ODS GRAPHICS ON;
TITLE "Bayes with Informative Priors";
PROC GENMOD DATA = Hearty DESCENDING;
 	CLASS Sex_bin (REF='0');
 	MODEL Y = Sex_bin / DIST=BIN LINK = IDENTITY;
	ESTIMATE 'Beta' Sex_bin -1 1;
 	LSMEANS Sex_bin / DIFF CL; 
	BAYES 
	SEED=10231995
	stats(PERCENT=2.5 50 97.5) 
	NBI=500 
	NMC=500  
	THINNING=5 
	DIAGNOSTICS=ALL    
	STATISTICS=ALL
	COEFFPRIOR=NORMAL(INPUT=NormalPrior_OverInformed)
	OUTPOST=post_OD;
	STORE logit_bayes_Hearty;
RUN;
ODS GRAPHICS OFF;

 

H
Pyrite | Level 9 H
Pyrite | Level 9

@Garnett et al.,

 

When I use the model you suggested in the toy example, I get an estimate of 0.165, which seems reasonable given there is a lot of data in this example. When I use totally flat priors I get about 0.157, so very close to the frequentist value. However, when I do my little trick of using overly informative priors (small variance), it does not provide back the value of the prior (e.g., 0.80), but produces 0.37. This makes me think I am doing something wrong with the priors. Hopefully this is something simple 🙂

DATA Hearty;
	set Sashelp.heart;
	IF Sex = 'Female' THEN Sex_bin = 0;
	IF Sex = 'Male' THEN Sex_bin = 1;
	IF Status = "Dead" THEN Y = 1;
	IF Status = "Alive" THEN Y = 0;
	KEEP Sex_bin Y;
RUN;
data heartless;
	set hearty;
	if sex_bin = '0' the sex_bin = -1;
run;
ods graphics on;
PROC MCMC DATA=Heartless
	seed=10231995
	outpost=logit_bayes_sen_mcmc
	monitor=(Rate_Difference)  
	nbi=500
	nmc=500
	nthin=5
	DIC
	statistics=all
	diagnostics=all;
 	parms (beta0 beta1);
	prior beta0 ~ normal(0.1, var=0.025);
	prior beta1  ~ normal(0.8, var=0.025); 
        prob=logistic(beta0 + beta1*sex_bin);
	model Y ~ binomial(1,prob); 
    Rate_Difference=logistic(beta0 + beta1*1) - logistic(beta0 + beta1*-1);
run;
ods graphics off;

 

Garnett
Obsidian | Level 7

Hi,

I've lost track of the problem in this thread, but here a couple of points:

-The rate difference can be negative, of course, but your original code had the rate difference as the binomial mean parameter. By definition, the binomial parameter has to be between 0 and 1.

-Given data, the posterior on beta0 or beta1 will be different from the prior unless the prior has zero variance. I'm not sure why it is important to show that the posterior and the prior are the same....

-If the "Sex_Bin" variable is (0,1), then the rate difference is given by 

 Rate_Difference=logistic(beta0 + beta1*1) - logistic(beta0+beta1*0);

 

I don't know if this is what you are looking for.... 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1442 views
  • 2 likes
  • 2 in conversation