BookmarkSubscribeRSS Feed
cws
Fluorite | Level 6 cws
Fluorite | Level 6

In PROC GENMOD (v9.4 TS 1M3, x64 for Windows), why is there a discrepancy between the exponentiated estimates as output using the EXP option in the LSMEANS statement and the calculated values (i.e. doing the exponentiation calculation in a data step) when the BAYES statement is used, but not when a frequentist analysis is done? The values differ very slightly in the Bayesian analysis. Am I missing something?

 

Here is some code, run on an example in the book by Stokes, Davis and Koch (p. 354).

data respire;
do region=1,2;
do agegp=2,3,4,5,6,1;
input cases total;
ltotal=log(total);
output;
end;
end;
datalines;
75 220407
68 198119
63 134084
45 70708
27 34233
64 1074246
76 564535
98 592983
104 450740
63 270908
80 161850
61 2880262
run;
proc format;
value agfmt 1='<35'
2='35-44'
3='45-54'
4='55-64'
5='65-74'
6='>75'
;
value regfmt 1='South'
2='North'
;
run;
proc print data=respire;
format agegp agfmt. region regfmt.;
run;
/* Frequentist*/
ods output diffs=frediffs_all ;

proc genmod data=respire;
class agegp region;
model cases=agegp region/dist=poisson link=log offset=ltotal;
lsmeans agegp /diff=control("1") exp;
lsmeans region /diff=control("1") exp;
title 'Koch p 354 Frequentist Poisson analysis';
run;
ods output close;
/* Bayesian*/
ods output diffs=baydiffs_all ;

proc genmod data=respire;
class agegp region;
model cases=agegp region/dist=poisson link=log offset=ltotal;
bayes seed=555 STATS(alpha=0.05 percent=2.5 25 50 75 97.5 )=all nbi=2000 nmc=300000 diagnostics=(autocorr ess gelman(nchain=3)) ;
lsmeans agegp /diff=control("1") exp;
lsmeans region /diff=control("1") exp;
title 'Koch p 354 Bayesian Poisson analysis';
run;
quit;
ods output close;
proc sort data=frediffs_all;
by agegp region;
run;
proc sort data=baydiffs_all;
by agegp region;
run;
/* Compare exponentiated (as calculated) vs. printed*/
data allestdiff;
set frediffs_all(in=f)
baydiffs_all(in=b);
by agegp region;
if f then antype='Frequentist';
else if b then antype='Bayesian';
expest=exp(estimate);
label expest='Expomnentiated estimate (calculated)';
run;
options ls=132;
proc compare data=allestdiff ;
id antype agegp region;
var expestimate ;
with expest;
title 'Comparison of exponentiated estimates as output, and calculated';
run;
proc print data=allestdiff label;
format region regfmt. agegp agfmt. estimate expest expestimate 8.6;
id antype;
var agegp region estimate expest expestimate;
title 'Estimates from GENMOD using frequentist and Bayesian analyses : exponentiated as output and as calculated';
run;

6 REPLIES 6
ehmst3
Calcite | Level 5

Closed form solution (frequentist) vs. monte carlo estimate from random sampling (Bayesian). Change the parameters of your sampling (seed, number of samples, number of chains, etc.) and your monte carlo estimate will change.

 

Even if the frequentist solution is not closed form (eg requires Newton-Raphson), the algorithm for solving is deterministic and thus will provide the same solution each time. 

cws
Fluorite | Level 6 cws
Fluorite | Level 6

No, this is not the point. I am not querying the difference between the results for the Bayeiand and frequentist analyses: I accept that they will not match exactly. I am questioning why the results you get when using the EXP option in LSMEANS differs from a calculated value when conducting a Byesian analysis, but not when using the frequentist analysis. For the frequentist analysis (i.e. normal Genmod without the BAYES statement), the estimates that are output with the EXP option are exactly the same as if you calculate the exponentiated values in a data step. If you use the BAYES statement this is not the case. Why?

ehmst3
Calcite | Level 5

Ahh, I see. Apologies for the lazy reply.

 

Well, with these issues I usually first consider loss due to floating point representation. In this case, is the Bayesian estimate of the odds ratio computed at each iteration? If so, then the procedure does not simply exponentiate the estimate at the end of the sampling, but rather uses a running average. As the average is computed, loss due to floating point representation may cause the observed difference. For the frequentist approach, the odds ratio is exactly exp(estimate). 

 

Just a guess though.  

cws
Fluorite | Level 6 cws
Fluorite | Level 6

Apology accepted! That's an interesting theory of yours and a possible explanation. I guess we shall have to wait for the guys at SAS to tell us whether that is correct. If it is, then it answers the question but still makes it rather difficult when it comes to presenting results: which should one use?!

 

If it's not the answer, then what is going on? I agree that floating point issues are often the cause of such things, which is why I did the exponentiation in a data step. I had been using my calculator but then wondered if the use of 4 DP was affecting things, and was interested to see that the discrepancy remained when I used the data step.

.

Crepino
Calcite | Level 5
Hello my dears,
Can someone provide me with the syntax of lsmeans option showing mean separations in letter using Process genmod

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2185 views
  • 0 likes
  • 3 in conversation