BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MJHUS
Calcite | Level 5

Does anyone know if there is an option for model selection using proc genmod? I am building a model with 30+ covariates

and need a means to select the best fitted model.

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Model effect selection for generalized linear models is available beginning in the current release - SAS 9.4 TS1M0 - using PROC HPGENSELECT.  For count models, it can also be done using the SELECTVAR= option in PROC COUNTREG in SAS/ETS software.  See the documentation:

http://support.sas.com/documentation/cdl/en/stathpug/66410/HTML/default/stathpug_hpgenselect_toc.htm

http://support.sas.com/documentation/cdl/en/etsug/66100/HTML/default/etsug_countreg_toc.htm

View solution in original post

7 REPLIES 7
AncaTilea
Pyrite | Level 9

Hi,

From my limited knowledge of Genmod, I don't think there is an automatic selection method.

Here is a relevant - somehow, post.

https://communities.sas.com/message/100613#100613

...Good luck!

SteveDenham
Jade | Level 19

In particular look at Dale's response regarding the controversial nature of automatic selection methods.  What does the response variable look like--is PROC LOGISTIC a viable alternative?

Steve Denham

MJHUS
Calcite | Level 5

Thanks to you both, Anca and Steve. I have seen the post you refer to and no proc logistic will not work in

my situation since I am modelling a poisson outcome with repeated measures.

AA1973
Calcite | Level 5

A selection algorithm would be a great feature to have in GENMOD. Although automatic selection methods are controversial in some instances, in some cases all one needs is a reasonable good-enough model with some of the noise removed. It would also be great to be able to obtain such model within a reasonable time and without too much programming.

In absence of the repeated measures, you could conduct the analysis in R, using the step() function. This function finds a model that minimizes either AIC or BIC, using a backward, forward, or stepwise (both backward and forward) searches. The function should work with models of the following families:  binomial, gaussian, Gamma, inverse.gaussian,  poisson, quasibinomial, quasipoisson. The  quasibinomial and quasipoisson families are the over-dispersed versions of the binomial and poisson, respectively.

However, the situation is even more complex when you have repeated measures. As far as I know there are no readily available selection algorithms for generalized linear models with repeated measures. A couple of months ago, I was working on a similar problem, and all I could find was a couple of experimental R packages, and that's about it.

Aside from the traditional stats methodology, there are some convoluted ways to approach the problem using data mining techniques, for instance: assuming that all subjects have similar number and timing for the repeated measures, you could conduct cluster analysis for the outcome and transform it into a categorical variable for trajectories (the clusters). Predictors that are time-dependent can also be transformed into trajectories. Then the transformed outcome, a nominal variable, can be used as dependent variable of a non-linear model such as a regression tree; the predictor selection is implicit in the tree-building algorithm. This is likely not implementable in SAS stat alone, as the clustering algorithms are there, but, as far as I know, the regression trees are not part of SAS stat, they are included in the SAS enterprise miner product. The approach can be attempted in R; however, regardless of the software, there are the issues of how many trajectories (clusters) to select, which is not a simple problem, and also what type of tree model to use, as there are many varieties (not sure which are available in SAS enterprise miner).

For lack of simpler alternatives, I would suggest a quick-and-dirty approach, albeit imperfect and with risk of bias: in GENMOD you could begin by fixing the correlation structure to exchangeable, and then try a humble backward selection manually, one-at-a-time, using p-values and checking at what point the information criterion (QIC for GEE in GENMOD) is minimized in the backward selection sequence. Select the set of predictors that minimize QIC.

Just and idea.

MJHUS
Calcite | Level 5

Thanks very much, AA1973, these are very helpful suggestions! Smiley Happy

1zmm
Quartz | Level 8

Steve Denham's suggestion about using the "best-subset" selection algorithm for independent variables in PROC LOGISTIC would give you a good clue about "important" independent variables.  Also consider PROC GLMSELECT that selects "good" sets of independent variables for models that are less affected by the biases in the usual forward and backwards stepwise selection methods.  Given that you have more than 30 independent variables, this implies more than one billion possible models; thus, using exhaustive searches through macros that successively select sets of independent variables is probably less feasible than the above two alternatives.  You may consider reducing the number of independent variables by using a method like PROC VARCLUS to "cluster" the independent variables and by then selecting one or a few of these variables to represent a given variable cluster.  Finally, you have the problem of selecting an appropriate variance-covariance/correlation matrix among the repeated measures.  This compounds the selection problem you have.

StatDave
SAS Super FREQ

Model effect selection for generalized linear models is available beginning in the current release - SAS 9.4 TS1M0 - using PROC HPGENSELECT.  For count models, it can also be done using the SELECTVAR= option in PROC COUNTREG in SAS/ETS software.  See the documentation:

http://support.sas.com/documentation/cdl/en/stathpug/66410/HTML/default/stathpug_hpgenselect_toc.htm

http://support.sas.com/documentation/cdl/en/etsug/66100/HTML/default/etsug_countreg_toc.htm

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 15672 views
  • 0 likes
  • 6 in conversation