08-28-2013 10:18 PM
From my limited knowledge of Genmod, I don't think there is an automatic selection method.
Here is a relevant - somehow, post.
08-29-2013 09:48 AM
In particular look at Dale's response regarding the controversial nature of automatic selection methods. What does the response variable look like--is PROC LOGISTIC a viable alternative?
08-29-2013 12:39 PM
Thanks to you both, Anca and Steve. I have seen the post you refer to and no proc logistic will not work in
my situation since I am modelling a poisson outcome with repeated measures.
08-29-2013 12:48 PM
A selection algorithm would be a great feature to have in GENMOD. Although automatic selection methods are controversial in some instances, in some cases all one needs is a reasonable good-enough model with some of the noise removed. It would also be great to be able to obtain such model within a reasonable time and without too much programming.
In absence of the repeated measures, you could conduct the analysis in R, using the step() function. This function finds a model that minimizes either AIC or BIC, using a backward, forward, or stepwise (both backward and forward) searches. The function should work with models of the following families: binomial, gaussian, Gamma, inverse.gaussian, poisson, quasibinomial, quasipoisson. The quasibinomial and quasipoisson families are the over-dispersed versions of the binomial and poisson, respectively.
However, the situation is even more complex when you have repeated measures. As far as I know there are no readily available selection algorithms for generalized linear models with repeated measures. A couple of months ago, I was working on a similar problem, and all I could find was a couple of experimental R packages, and that's about it.
Aside from the traditional stats methodology, there are some convoluted ways to approach the problem using data mining techniques, for instance: assuming that all subjects have similar number and timing for the repeated measures, you could conduct cluster analysis for the outcome and transform it into a categorical variable for trajectories (the clusters). Predictors that are time-dependent can also be transformed into trajectories. Then the transformed outcome, a nominal variable, can be used as dependent variable of a non-linear model such as a regression tree; the predictor selection is implicit in the tree-building algorithm. This is likely not implementable in SAS stat alone, as the clustering algorithms are there, but, as far as I know, the regression trees are not part of SAS stat, they are included in the SAS enterprise miner product. The approach can be attempted in R; however, regardless of the software, there are the issues of how many trajectories (clusters) to select, which is not a simple problem, and also what type of tree model to use, as there are many varieties (not sure which are available in SAS enterprise miner).
For lack of simpler alternatives, I would suggest a quick-and-dirty approach, albeit imperfect and with risk of bias: in GENMOD you could begin by fixing the correlation structure to exchangeable, and then try a humble backward selection manually, one-at-a-time, using p-values and checking at what point the information criterion (QIC for GEE in GENMOD) is minimized in the backward selection sequence. Select the set of predictors that minimize QIC.
Just and idea.
08-31-2013 06:00 PM
Steve Denham's suggestion about using the "best-subset" selection algorithm for independent variables in PROC LOGISTIC would give you a good clue about "important" independent variables. Also consider PROC GLMSELECT that selects "good" sets of independent variables for models that are less affected by the biases in the usual forward and backwards stepwise selection methods. Given that you have more than 30 independent variables, this implies more than one billion possible models; thus, using exhaustive searches through macros that successively select sets of independent variables is probably less feasible than the above two alternatives. You may consider reducing the number of independent variables by using a method like PROC VARCLUS to "cluster" the independent variables and by then selecting one or a few of these variables to represent a given variable cluster. Finally, you have the problem of selecting an appropriate variance-covariance/correlation matrix among the repeated measures. This compounds the selection problem you have.
09-03-2013 01:48 PM
Model effect selection for generalized linear models is available beginning in the current release - SAS 9.4 TS1M0 - using PROC HPGENSELECT. For count models, it can also be done using the SELECTVAR= option in PROC COUNTREG in SAS/ETS software. See the documentation: