Programming the statistical procedures from SAS

Criterion for "selection=stepwise" in "proc logistic"

Reply
Contributor
Posts: 29

Criterion for "selection=stepwise" in "proc logistic"

Hallo,

I used "proc logistic" to model an ordinal variable. To find significant predictors, I used the option "selection=stepwise".

I have the problem, that I don't know, what is the criterion, to add or to drop variables. I found some ways in the literature. For example, that you can do it with the p-value of the likelihood-ratio-statistic.lec

Based on what criteria decides SAS?


Thank you for the support!

Daniel

Frequent Contributor
Posts: 140

Re: Criterion for "selection=stepwise" in "proc logistic"

Stepwise selection is not recommended. The p-values from the final model will be too low, the standard errors too small, the confidence intervals too narrow, the model too complex and the parameter estimates biased away from 0.   For a simple demonstration of this (albeit in GLM, not LOGISTIC, but the same thing applies) see my paper with David Cassell: Stopping Stepwise, here is the version we gave at NESUG a while back. For more proof, see Frank Harrell's book Regression Modelling Strategies (he uses R, but it's still a really good book :-)).  If you must use an automated method, I suggest LASSO or LARS in PROC GLMSELECT. Unfortunately, SAS does not yet have this for LOGISTIC, but I have found that getting several recommended models from GLMSELECT and then examining them in LOGISTIC works well. This may be even more so with an ordinal DV.

Also, the last sentence in your paragraph starting "I have the problem...." seems to be cut off.

As to your specific question, see the documentation, which states that SAS uses the chi-square statistic for each effect not in the model.

SAS Employee
Posts: 122

Re: Criterion for "selection=stepwise" in "proc logistic"

In SAS EM version 12.1 or later, the LASSO or LARS options are available under HPREGRESSION node. In HP STAT product 12.1 or later, the options are available HPLOGISTIC procedure. The difference is in HPLOGISTIC the options are no longer options under MODEL. They are under stand-alone SELECTION statement. The whole HP STAT package does not require huge investment in hardware to run. It is available as upgrade to your regular STAT. Computation wise, all the HP procedures are designed to run on multi-threads. On this computer I am typing this reply, there are 4 cores and 8 threads in total. A logistic regression model that used to run ~74 minutes with PROC LOGISTIC is down to 8 minutes. Generally, if all possible, one may want to engage hold-out data earlier in modeling estimation and selection. In HPREG, a separate PARTITION statement is already built in. HPLOGISTIC part is under construction, so I heard.

Jason Xin

Ask a Question
Discussion stats
  • 2 replies
  • 309 views
  • 7 likes
  • 3 in conversation