BookmarkSubscribeRSS Feed
8378
Calcite | Level 5
I’m conducting logistic regression using proc logistic on the sample consisting of approximately 150000 people described by 1500 variables. The analysis lasts for about 8 hours. Do you know if there is any methodical way to speed it up? Or is it rather a software/hardware problem?

Thanks a lot.

Regards
Iryna
4 REPLIES 4
Olivier
Pyrite | Level 9
Hi Iryna.
I don't think you really need all these 1500 variables to be used in the model, do you ?
So I'd rather use both SELECTION=FORWARD and STOP=50 to see which variables are the (at most) fifty best-contributing to your model, and then rerun the model with them...

Regards
Olivier
1162
Calcite | Level 5
Are any of these 1,500 variables highly correlated? If so, you might be able to select one among a group of highly correlated variables or use a small number of principal components (from a Principal Components Analysis) for your logistic regression.
datalligence
Fluorite | Level 6
iryna, i think you don't need all 150000 records/observations either.

for example, if you are interested in variables that rate respondents' ratings of certain job attributes, you may want to use the data for employed respondents only.
Matthew_Galati
SAS Employee
Is this question related to Mathematical Optimization and Operations Research with SAS? If not, this is the wrong forum.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

Discussion stats
  • 4 replies
  • 1758 views
  • 0 likes
  • 5 in conversation