Hello everyone! I'm having a problem I hope some of you may be able to help with.
We are doing a project in which we wish to do variable selection for an MLE statistical model using PROC GENMOD. We wished to make our variable selection by running every combination of variables against GENMOD and retrieving our determining statistic. To accomplish this I wanted to create a table in which one of the columns is the concatenation of variable combinations that I would then use to run a table based loop.
Using ALLCOMB I was able to borrow/modify code that creates the table I need (see code below). However, we will have more than 33 variables and I have not been able to replicate my results using ALLCOMBI. Can someone offer suggestions on how to modify my code to use ALLCOMBI? Thanks very much in advance for any help you may be able to provide!
(BTW, I am also open to suggestions on running the GENMOD directly off the ALLCOMBI instead of using the table loop if that's possible, but I didn't see how that could be done.)
data have;
input prog $ mesr $ x1 $ x2 $ x3 $ x4 $ x5 $;
datalines;
adt q2r v001 v002 v003 v004 v005
; run;
data want;
set have;
array _a x1--x5;
do i = 2 to 4;
do j = 1 to comb(dim(_a), i);
call allcomb(j, i, of _a{*});
length comb $500;
call missing (comb);
do k = 1 to i;
if missing(_a{k}) then leave;
comb = catx(" ", comb, _a{k});
cnt1 = i;
cnt2 = j;
end;
if k > i then output;
end;
end;
keep cnt1 cnt2 prog mesr comb;
run;
We are doing a project in which we wish to do variable selection for an MLE statistical model using PROC GENMOD. We wished to make our variable selection by running every combination of variables against GENMOD and retrieving our determining statistic.
There are papers that show how to do an all-possible regressions using PROC REG. Should be easy to modify for GENMOD.
https://support.sas.com/kb/24/986.html
Thank you @PaigeMiller for the suggestion! We'll review and see if it will work for us.
33 variables taken how many at a time?
If you are meaning 33 one at a time, 33 two at a time ..., 33 thirty-three at a time you are looking at 8,589,934,591 combinations.
Which if you run one model per second (optimistic) will take about 272 years to complete.
So you may want to consider stating a bit more clearly what you want to attempt.
Yes, good point, @ballardw people sometimes don't think of these things when they decide they want a "brute force" answer, rather than trying to come up with a solution that is actually do-able. I didn't even think of it.
Another possibility (although I despise stepwise methods) is to find the paper on the stepwise algorithm for generalized linear models (I think it uses PROC GLIMMIX), which could do the calculations in reasonable amount of time. It's not all possible regressions, but nothing is.
Which, as it always does, brings me to Partial Least Squares (PROC PLS). PLS greatly reduces the issue of multi-collinearity such that 33 correlated predictor variables is not really a problem. In fact, Randy Tobias (of SAS Institute) gives an example of PLS with 1,000 highly correlated x-variables, and it produces a usable model which is robust to multi-collinearity, with no variable selection step. PLS does not do some of the things that GENMOD does (no link function in PLS, no generalized linear models in PLS), but still it may be useful in some way in coming up with a model. Or it may not be useful, depending on why you chose GENMOD in the first place. There's also a generalized Partial Least Squares, but this isn't programmed in SAS (although it is programmed in R).
Thank you for the response @ballardw.
We were certainly concerned about processing time and resources, but your point is both well and starkly made.
We have around 80 possible variables to choose from but we wanted to find combinations of between 25 and 45 variables. We are also looking at ways to reduce the total number further. Even with these limitations, this method may not be the ideal way as others have suggested.
Nonetheless, even if it is for my own edification and knowledge, if the posted code could be done instead with ALLCOMBI, I would be interested in seeing how.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.