06-14-2013 01:34 AM
I'm currently experimenting with Bayesian methodology that SAS has made available in 9.2. In particular, I'm using PROC MCMC to generate a logistic model. However, unlike PROC LOGISTIC, the determination criteria for whether variables are significant is missing. (i.e. alpha < 0.05). From reading the documentation, it appears as if SAS only makes available the deviance information criterion (DIC). So, the only option would be to perform 2^n-1 model selection fits while storing DIC before comparing the smallest value to determine the model. Is this the only viable way SAS enables to determine variables that should be in the model?
06-14-2013 08:42 AM
Well, to start with, most Bayesians would be appalled by testing, so MCMC tries to provide credible intervals to use in decision making. Check your output for posterior intervals. Which variables have intervals that are bounded away from zero? Those are the variables that contribute "significantly" to the fit.
However, if you are trying to do variable selection, this will run into the same problems with sequential test based methods (i.e. stepwise/forward/backward/all possible subsets), in that the intervals will be biased towards zero and narrow compared to the "true" model.
06-14-2013 09:10 AM
Doesn't observing only variables that have intervals which do not include zero constitute as a hypothesis test in itself?
The frequentist from within remembers that confidence intervals that did not include zero mimicked a two-tail hypothesis test with H_o: θ = 0 against H_a: θ ≠ 0
So, if traditional variable selection approaches would bias the model, is there a particular method aside from posterior intervals to consider to reduce the amount of variables in a model?
06-14-2013 10:59 AM
Subject matter expertise.
Really--it has less bias than other methods statistically, at least for variable selection. There have been some significant discussions both here and on the SAS-L listserv about variable selection. PROC GLMSELECT documentation under Model Selection Issues talks about many of the drawbacks.
It also implements least angle regression (LAR) and LASSO methods. It has been shown that these methods can be extended to logistic regression. And of course, a lot depends on what the model is about--explanatory hypothesis testing or predictive ability.
Get a copy of Frank Harrell's Regression Modeling Strategies for a good look at methodologies.