I am reading mixed things about whether it is appropriate to use a stepwise selection for a prediction ordered logistic regression model. Does anyone have any input on this they would be willing to share?
Better, more modern selection methods include LASSO, least angle regression (LAR), and elastic net. These methods are available in various SAS procedures as mentioned in this list of frequently asked-for statistics. For logistic models, the LASSO method is available in PROC HPGENSELECT. For more information see this note and the link there to an article by Gunes.
The statistical literature is not mixed regarding the appropriateness of stepwise methods: the consensus (over literally decades of study) is, Don't use them. You can review the literature for reasons; the primary disadvantage is inflated Type I error, but there are other disadvantages as well.
My sense is that practitioners (in other words, non-statisticians) promote stepwise because it is SO EASY, you hardly have to give any thought to it at all. But easy is not the same thing as good or appropriate.
@sld wrote:
My sense is that practitioners (in other words, non-statisticians) promote stepwise because it is SO EASY, you hardly have to give any thought to it at all. But easy is not the same thing as good or appropriate.
In fact, if you take a class from SAS Institute about logistic regression, you hear this. The specific instructor that I heard made a point to mention that stepwise has all these drawbacks, but I was left with the impression that the instructor was advising the class to go ahead and use it anyway.
I believe a better solution is Logistic Partial Least Squares regression. Partial Least Squares is usually much more effective in the case of collinearity, and is available in SAS for continuous responses, but there is no logistic version available in SAS. You could probably program your own version of Logistic PLS using this paper: https://cedric.cnam.fr/fichiers/RC906.pdf
Better, more modern selection methods include LASSO, least angle regression (LAR), and elastic net. These methods are available in various SAS procedures as mentioned in this list of frequently asked-for statistics. For logistic models, the LASSO method is available in PROC HPGENSELECT. For more information see this note and the link there to an article by Gunes.
Is LASSO appropriate for only really large datasets?
No, as shown in the note I referred to, it just adds a penalty to the log likelihood to be maximized.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.