I am reading mixed things about whether it is appropriate to use a stepwise selection for a prediction ordered logistic regression model. Does anyone have any input on this they would be willing to share?
Better, more modern selection methods include LASSO, least angle regression (LAR), and elastic net. These methods are available in various SAS procedures as mentioned in this list of frequently asked-for statistics. For logistic models, the LASSO method is available in PROC HPGENSELECT. For more information see this note and the link there to an article by Gunes.
The statistical literature is not mixed regarding the appropriateness of stepwise methods: the consensus (over literally decades of study) is, Don't use them. You can review the literature for reasons; the primary disadvantage is inflated Type I error, but there are other disadvantages as well.
My sense is that practitioners (in other words, non-statisticians) promote stepwise because it is SO EASY, you hardly have to give any thought to it at all. But easy is not the same thing as good or appropriate.
@sld wrote:
My sense is that practitioners (in other words, non-statisticians) promote stepwise because it is SO EASY, you hardly have to give any thought to it at all. But easy is not the same thing as good or appropriate.
In fact, if you take a class from SAS Institute about logistic regression, you hear this. The specific instructor that I heard made a point to mention that stepwise has all these drawbacks, but I was left with the impression that the instructor was advising the class to go ahead and use it anyway.
I believe a better solution is Logistic Partial Least Squares regression. Partial Least Squares is usually much more effective in the case of collinearity, and is available in SAS for continuous responses, but there is no logistic version available in SAS. You could probably program your own version of Logistic PLS using this paper: https://cedric.cnam.fr/fichiers/RC906.pdf
Better, more modern selection methods include LASSO, least angle regression (LAR), and elastic net. These methods are available in various SAS procedures as mentioned in this list of frequently asked-for statistics. For logistic models, the LASSO method is available in PROC HPGENSELECT. For more information see this note and the link there to an article by Gunes.
Is LASSO appropriate for only really large datasets?
No, as shown in the note I referred to, it just adds a penalty to the log likelihood to be maximized.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.