BookmarkSubscribeRSS Feed
BjoernHolzhauer
Obsidian | Level 7

Normally, for a standard maximum likelihood GLM we specify a reference category for each categorical factor in the model. However, if we are fitting a Bayesian model with informative priors this is not necessary and it may easiest to elicitate priors for the overparameterized model. Thus, it would be convenient to run a parameterize my Bayesian Poisson regression so that there is no reference group (at least for some variables).

 

Below is an example, where I assume that for the drug effect I may want to specify a reference group (0=no drug treatment) and that I have some overall prior for the average log-rate and priors about by how much each category may from this overall prior. However, what happens is that the prior for cat=5 is not used and this category is assumed to correspond to a zero effect (=the effect of the intercept + potential drug treatment effect is assumed). I tried changing the parameterization using the param=GLM/EFFECT option in the CLASS statement, but that did not get me the result. 

 

Is there some way to get PROC GENMOD to accept a different parameterization for a Bayesian analysis? It would be fine, if I cannot have a reference parameterization for the treatment effect, since I assume I could always effectively get that with a near-point-prior (i.e. with really low variance).

data test;
  do cat=1 to 5;
  do drug=0 to 1;
    t=100+100*(cat>5);
	logt=log(t);
	y=ranbin(1234,t,0.1*(3**drug));
	output;
  end;
  end;
run;

data NormalPrior;
  input _type_ $ Intercept drug1 cat1-cat5;
  datalines;
Var  100    100   100 100 100 100 100
Mean -2.3     0     0   0   0   0   0
;
run;

ods graphics on;
proc genmod data=test;
   class cat drug(ref='0');
   model y = drug cat / dist=Poisson link=log offset=logt;
   bayes seed=1 plots=none coeffprior=normal(input=NormalPrior) nbi=10000 nmc=100000 thin=10;
run;
8 REPLIES 8
SteveDenham
Jade | Level 19

Not sure if this will work, but try adding the NOINT option to the model statement.  Under this parameterization, estimates aren't deviations from the intercept (resulting in the reference level having a value of zero), and so you might be able to get this to work for the independent variable listed first in the model statement.

 

Steve Denham

BjoernHolzhauer
Obsidian | Level 7

That would help in terms of ensuring that I can have identical priors for each category, but would not match up with the prior elicitation (and integrating out the prior for the intercept would result in non-independent non-normal priors that I do not think genmod let's me specify). I guess I could hand-code what I want in PROC MCMC, but it would seem like a shame to have to do that, because I am not sure I can get an equally efficient sampler as in PROC GENMOD. Perhaps time to switch to R and rjags?

Rick_SAS
SAS Super FREQ

I don't fully understand the question, but did you try the REFERENCE parameterization?

   class cat drug(ref='0') / param=reference;

The REFERENCE encoding is similar to the GLM encoding, but with the (redundant) last column dropped.

SteveDenham
Jade | Level 19

Bjoern,

 

I had good luck with this on your dataset:

 

data NormalPrior;
  input _type_ $ drug0 drug1 cat1-cat5;
  datalines;
Var  100 100   100   100 100 100 100 100
Mean -2.5	-1.4     0   0   0   0   0 0
;
run;

ods graphics on;
proc genmod data=test;
   class cat drug(ref='0');
   model y = cat drug / dist=Poisson link=log offset=logt noint;
   bayes seed=1 plots=none coeffprior=normal(input=NormalPrior) nbi=10000 nmc=100000 thin=10;
run;

I grant that the priors on the drug effects were stolen from the ML estimates, but it looks like I am getting what looks like reasonable results for all 5 levels of cat, and one for drug (drug1).  I think I am missing something here that is critical to answering your question, so this would be a great opportunity for me to learn more about the Bayesian approach in GENMOD. Note:Using @Rick_SAS's suggestion of param=reference reduces the number of estimates to 4 levels of cat (with cat5 being the reference).

 

With my approach, and plugging in the posterior means and the offset, I get very close to the y values, so I must be missing something.

 

Steve Denham

 

 

 

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I also do not see the problem. It is true that the last factor level is forced to be zero with GLM parameterization if there is an intercept. But then the intercept is the last factor level. Just put your prior for the last factor level effect on the intercept, and this should work as intended. I get the sense that you want to use elicited priors for the intercept and all five factor level effects (hence the problem). But this cannot be done. If you have a model with five factor levels, you can have five parameters, not six. This has nothing to do with Bayesian vs frequentist analysis.

 

The noint option will get rid of the intercept and hence the five factor level parameters will have direct interpretation. There is no concern about independence and integrating out the intercept. This rasises a different point: are your priors intended to be on the expected values for each level of the factor, or on the effect of the factor relative to one level of the factor. If you use an intercept, then the parameters are automatically effects relative to another level, and the priors should be for the effects (Not for means). If you use a nonint, then the parameters are directly interpreted as expected values, and the priors should be for these (not for effects relative to a reference).

 

If you use another parameterization, you are essentially still working with effects relative to some other level (or possibly relative to an overall mean). Each would require different choices for priors, if you wanted informative priors.

BjoernHolzhauer
Obsidian | Level 7

What I would like is to get estimates for the intercept, categories 1 to 5 and drug (=1). As mentioned, if this were maximum likelihood estimation, the model would not be identifiable, but if I have informative priors on each of these that is not an issue and should presumably be possible.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I am not convinced that six parameters for five factor levels is identifiable for Bayesian analysis either. You would need an extremely informative prior at 0 for one of the parameters, or informative priors for all the parameters (or several of them). I think you would have to do this with PROC MCMC.  I tried to trick GENMOD into doing this by creating dummy variables iwth PROC GLMMOD, using these dummy variables as covariables (no class statement), and then fitting this with GENMOD. But the program recognizes the overparameterization and zeros out two of the dummy variable parameters (one for drug and one for cat). So you are back to where you started. Very sophisticated algorithm. 

I am not aware of any GENMOD options to override this (and I am not convinced it is a good idea).

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Thinking more about your problem. I wrote the following code to fit the GENMOD model with PROC MCMC, and the model you want to fit (intercept and parameters for all five levels of cat -- overparameterized). See the titles and comments to find out what happens.

proc glmmod data=test outdesign=testout;
title2 'get design matrix in file (all 0s and 1s)';
class cat drug;
model y = cat drug ;
run;
proc print data=testout;run;

data test2; merge test testout;	*-combined;
run;

proc genmod data=test2;
title2 'using dummy variables -- same result as with class statement';
   model y = col1-col8 / dist=Poisson link=log offset=logt noint;
   bayes seed=1  nbi=10000 nmc=100000 thin=10;
run;


data NormalPrior2;
  input _type_ $ col1-col8 ;
  datalines;
Var  100 100   100   100 100 100 100 100
Mean -2.5	-1.4     0   0   0   0   0 0
;
run;
proc genmod data=test2;
title2 'using dummy variables -- same result as with class statement';
   model y = col1-col8 / dist=Poisson link=log offset=logt noint;
   bayes seed=1 coeffprior=normal(input=NormalPrior2) nbi=10000 nmc=100000 thin=10;
run;


*-good approach to fit the model fitted by GENMOD;
proc mcmc data=test2 nmc=50000 seed=1 thin=5 outpost=outpost;
title2 'with intercept, no cat5 level (i.e., no col6 dummy var.)';
title3 'noninformative priors -- analogous to GENMOD approach';
parms b1-b5 c1;	*-b1 is intercept, b2-b6 for cat, c1 for drug1;
prior b: ~ normal(0,var=100);
prior c1 ~ normal(0,var=100);
eta = b1 + b2*col2 + b3*col3 +b4*col4 + b5*col5 +  c1*col7 + logt;
f = exp(eta);
model y ~ Poisson(f);
run;


*-effort to use overparameterized model desired by the original poster. Terrible results.;
proc mcmc data=test2 nmc=50000 seed=1 thin=5outpost=outpost;
title2 'with intercept, and all FIVE levels of cat (col2-col6), overparameterized';
title3 'noninformative priors -- meaningless results, right now';
parms b1-b6 c1;
prior b: ~ normal(0,var=100);
prior c1 ~ normal(0,var=100);
eta = b1 + b2*col2 + b3*col3 +b4*col4 + b5*col5 + b6*col6 + c1*col7 + logt;
f = exp(eta);
model y ~ Poisson(f);
run;

*-attempt to use overparameterized model, but with highly informative prior on last parm;
proc mcmc data=test2 nmc=50000 seed=1 thin=5 outpost=outpost;
title2 'with intercept, and all five levels of cat (col2-col6), overparameterized';
title3 'but with highly informative b6 prior (for cat5 [col6] level) -- MAY be reasonable results';
parms b1-b6 c1;
prior b1 b2 b3 b4 b5 ~ normal(0,var=1000);
prior b6 ~ normal(1,var=.001);	*-arbitrary highly informative prior on b6 ;
prior c1 ~ normal(0,var=1000);
eta = b1 + b2*col2 + b3*col3 +b4*col4 + b5*col5 + b6*col6 + c1*col7 + logt;
f = exp(eta);
model y ~ Poisson(f);
run;
*^above "works", but I can get any b6 estimate I want by changing the mean 
of the prior on b6. Data is not influencing the posterior for b6 with this
prior. Moreover, posteriors for the other parameters are totally different from
that obtained in with the more reasonable model without a b6 parameter. 
So, I am very skeptical of this.;

Good luck. Based on this quick-and-dirty programming effort, I am still skeptical of what you want to do. But there are many folks who are better Bayesians than I am.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2043 views
  • 7 likes
  • 4 in conversation