I am trying to run a proc logistic regression model on around 1.3 million rows. I have been able to reduce the variables to 30.
If I run a all subset model with no interactions it is around 2^30 = 1,073,741,824 runs.
1> Is thier a way to find out how much time will it take?
2> Is their any technique to perform these many runs quicker?
Currently I am using stepwise, backward etc to run the model. Also I have PC SAS 9.2 (TS2M0) X64_ESRV platform
There is a section on computational resources in the reference manual for LOGISTIC.
I think that you are on thin ice with this all-possible-regressions approach. You will get estimates that fit the best for your data, but that are unlikely to be reproducible. With 1.3M observations, you are also likely to find predictors that are statistically significant without having any business value.
"I gather that logistics regression will fail to provide good predictive models." <-- that is not what I said at all. A good predictive model may have variables in it that predict well in a statistical sense (e.g. significant p-value), but are not useful for business decision making.
My other comment ("thin ice") referred to the classic problem in statistics of "multiple comparisons". If you do a billion analyses on 1.3 million observations, as you described in your initial post, you are going to get some models that predict well but are wrong in a business sense. It is not a problem with logistic regression, it is a problem with misapplication.