06-13-2014 08:51 AM
This is going to sound really weird, but the only PROC I found with model selection ability and the capability of specifying an offset variable was PHREG, and I don't particularly like any of the methods available (no LAR or LASSO). I suppose you could divide your response variable by log(offset value) and use GLMSELECT that way.
06-16-2014 11:35 AM
Try using the HPGENSELECT procedure, the MODEL statement has an OFFSET option.
Below are some information about the procedure, and links to a short YouTube video and documentation.
The new HPGENSELECT procedure, available with SAS/STAT 12.3 (which runs on Base 9.4), performs model selection for generalized linear models (GLMs). such as Poisson regression, negative binomial regression, and any other GLM. Designed for the distributed computing of SAS High-Performance Statistic, PROC HPGENSELECT also works in single-machine mode. It provides forward, backward, and stepwise selection (LASSO-type methods are still in progress) and includes the AIC, SBC, and AICC selection criteria.
06-16-2014 01:08 PM
For all of the other reasons for not using forward, backward or stepwise methods, please read:"Stopping Stepwise" by Peter Flom and David Cassell. It is available at http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf. Whatever problems are seen with normally distributed data are extended with non-normally distributed data, especially skewed distributions such as a Poisson or negative binomial.
06-17-2014 12:11 PM
Thank Funda, that's a good news. Also sincerely thank Steve, I'll read your paper carefully.
But I have to mention that, the situation has changed, once I creating models, it's like cooking for dinner, patiently and strictly following statistics. but now when facing big data, and thousands of product categorries, I have to create thousands of models at the same time, no more time is given to me to check each of them, so a simple and efficient way is needed, I can't find a better way to replace 'STEPWISE' at present.
06-17-2014 01:07 PM
Read David and Peter's paper carefully--and it will help you be able to explain why the predictive model you came up with using STEPWISE performed so badly when presented with new data.
In big data, you would be as wise to ask a five year old to pick out important variables based on how cool the name sounded as to use STEPWISE methods. Between bias of the estimators and poor control of multiple testing you have a recipe for a poor predictive model. Clustering and classification trees will do much better.