BookmarkSubscribeRSS Feed
DaLack
Calcite | Level 5

Hallo,

I used "proc logistic" to model an ordinal variable. To find significant predictors, I used the option "selection=stepwise".

I have the problem, that I don't know, what is the criterion, to add or to drop variables. I found some ways in the literature. For example, that you can do it with the p-value of the likelihood-ratio-statistic.lec

Based on what criteria decides SAS?


Thank you for the support!

Daniel

2 REPLIES 2
plf515
Lapis Lazuli | Level 10

Stepwise selection is not recommended. The p-values from the final model will be too low, the standard errors too small, the confidence intervals too narrow, the model too complex and the parameter estimates biased away from 0.   For a simple demonstration of this (albeit in GLM, not LOGISTIC, but the same thing applies) see my paper with David Cassell: Stopping Stepwise, here is the version we gave at NESUG a while back. For more proof, see Frank Harrell's book Regression Modelling Strategies (he uses R, but it's still a really good book :-)).  If you must use an automated method, I suggest LASSO or LARS in PROC GLMSELECT. Unfortunately, SAS does not yet have this for LOGISTIC, but I have found that getting several recommended models from GLMSELECT and then examining them in LOGISTIC works well. This may be even more so with an ordinal DV.

Also, the last sentence in your paragraph starting "I have the problem...." seems to be cut off.

As to your specific question, see the documentation, which states that SAS uses the chi-square statistic for each effect not in the model.

JasonXin
SAS Employee

In SAS EM version 12.1 or later, the LASSO or LARS options are available under HPREGRESSION node. In HP STAT product 12.1 or later, the options are available HPLOGISTIC procedure. The difference is in HPLOGISTIC the options are no longer options under MODEL. They are under stand-alone SELECTION statement. The whole HP STAT package does not require huge investment in hardware to run. It is available as upgrade to your regular STAT. Computation wise, all the HP procedures are designed to run on multi-threads. On this computer I am typing this reply, there are 4 cores and 8 threads in total. A logistic regression model that used to run ~74 minutes with PROC LOGISTIC is down to 8 minutes. Generally, if all possible, one may want to engage hold-out data earlier in modeling estimation and selection. In HPREG, a separate PARTITION statement is already built in. HPLOGISTIC part is under construction, so I heard.

Jason Xin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1613 views
  • 7 likes
  • 3 in conversation