BookmarkSubscribeRSS Feed
qkaiwei
Calcite | Level 5

Hi:

   How to set the offset varible in glmselect procedure.

for example, there's offset option in genmod procedure for poisson regression.

Thank you!

5 REPLIES 5
SteveDenham
Jade | Level 19

This is going to sound really weird, but the only PROC I found with model selection ability and the capability of specifying an offset variable was PHREG, and I don't particularly like any of the methods available (no LAR or LASSO).  I suppose you could divide your response variable by log(offset value) and use GLMSELECT that way.

Steve Denham

Funda_SAS
SAS Employee

Try using the HPGENSELECT procedure, the MODEL statement has an OFFSET option.

Below are some information about the procedure, and links to a short YouTube video and documentation.

The new HPGENSELECT procedure, available with SAS/STAT 12.3 (which runs on Base 9.4), performs model selection for generalized linear models (GLMs). such as Poisson regression, negative binomial regression, and any other GLM. Designed for the distributed computing of SAS High-Performance Statistic, PROC HPGENSELECT also works in single-machine mode. It provides forward, backward, and stepwise selection (LASSO-type methods are still in progress) and includes the AIC, SBC, and AICC selection criteria.

http://www.youtube.com/watch?v=RV6lkXNpoKA

http://support.sas.com/documentation/cdl/en/stathpug/66410/HTML/default/viewer.htm#stathpug_hpgensel...

Funda

SteveDenham
Jade | Level 19

For all of the other reasons for not using forward, backward or stepwise methods, please read:"Stopping Stepwise" by Peter Flom and David Cassell.  It is available at http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf.  Whatever problems are seen with normally distributed data are extended with non-normally distributed data, especially skewed distributions such as a Poisson or negative binomial.

Steve Denham


qkaiwei
Calcite | Level 5

Thank Funda, that's a good news. Also sincerely thank Steve, I'll read your paper carefully.

But I have to mention that, the situation has changed, once I creating models, it's like cooking for dinner, patiently and strictly following statistics. but now when facing big data, and thousands of product categorries, I have to create thousands of models at the same time, no more time is given to me to check each of them, so a simple and efficient way is needed, I can't find a better way to replace 'STEPWISE' at present.

SteveDenham
Jade | Level 19

Read David and Peter's paper carefully--and it will help you be able to explain why the predictive model you came up with using STEPWISE performed so badly when presented with new data.

In big data, you would be as wise to ask a five year old to pick out important variables based on how cool the name sounded as to use STEPWISE methods.  Between bias of the estimators and poor control of multiple testing you have a recipe for a poor predictive model.  Clustering and classification trees will do much better.

Steve Denham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3904 views
  • 0 likes
  • 3 in conversation