I tried to run a proc logistic on a dataset containing 165265 obersvations and 70 variables. The process took two days and a half, which seems to be extremely long (please tell me this is not normal). I did add the following option for the model :
selection = stepwise SLENTRY = 0.99 SLSTAY = 0.995 lackfit fast nocheck BEST=3 START=2 STOP=4 maxstep =2 --> for the train sample
selection = stepwise SLENTRY = 0.99 SLSTAY = 0.995 lackfit fast nocheck BEST=3 START=2 STOP=4 maxstep =2 maxiter=0 --> for the test (or validation sample).
Can someone please tell me why this is taking too long and how I may optimize the process time?
Because logistic uses an iterative search algorithm, most of that time was reading and writing data to disk in the utility file. There are several things that you can do to speed it up:
-- move to a 64 bit operating platform and put in lots of memory.
-- use a faster disk drive for the WORK library. solid state might be optimal.
-- reduce the dimension of your problem (e.g. use fewer candidate variables).
See the "Computational Resources" section of the reference manual for memory details.