BookmarkSubscribeRSS Feed
Cboath
Calcite | Level 5

Hello,

I'm currently experimenting with Bayesian methodology that SAS has made available in 9.2. In particular, I'm using PROC MCMC to generate a logistic model. However, unlike PROC LOGISTIC, the determination criteria for whether variables are significant is missing. (i.e. alpha < 0.05). From reading the documentation, it appears as if SAS only makes available the deviance information criterion (DIC). So, the only option would be to perform 2^n-1 model selection fits while storing DIC before comparing the smallest value to determine the model. Is this the only viable way SAS enables to determine variables that should be in the model?

3 REPLIES 3
SteveDenham
Jade | Level 19

Well, to start with, most Bayesians would be appalled by testing, so MCMC tries to provide credible intervals to use in decision making.  Check your output for posterior intervals.  Which variables have intervals that are bounded away from zero?  Those are the variables that contribute "significantly" to the fit.

However, if you are trying to do variable selection, this will run into the same problems with sequential test based methods (i.e. stepwise/forward/backward/all possible subsets), in that the intervals will be biased towards zero and narrow compared to the "true" model.

Steve Denham

Cboath
Calcite | Level 5

Doesn't observing only variables that have intervals which do not include zero constitute as a hypothesis test in itself?

The frequentist from within remembers that confidence intervals that did not include zero mimicked a two-tail hypothesis test with H_o: θ = 0 against  H_a: θ ≠ 0

So, if traditional variable selection approaches would bias the model, is there a particular method aside from posterior intervals to consider to reduce the amount of variables in a model?

SteveDenham
Jade | Level 19

Subject matter expertise.

Really--it has less bias than other methods statistically, at least for variable selection.  There have been some significant discussions both here and on the SAS-L listserv about variable selection.  PROC GLMSELECT documentation under Model Selection Issues talks about many of the drawbacks.
It also implements least angle regression (LAR) and LASSO methods.  It has been shown that these methods can be extended to logistic regression.  And of course, a lot depends on what the model is about--explanatory hypothesis testing or predictive ability.

Get a copy of Frank Harrell's Regression Modeling Strategies for a good look at methodologies.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1473 views
  • 0 likes
  • 2 in conversation