BookmarkSubscribeRSS Feed
8378
Calcite | Level 5
I’m conducting logistic regression using proc logistic on the sample consisting of approximately 150000 people described by 1500 variables. The analysis lasts for about 8 hours. Do you know if there is any methodical way to speed it up? Or is it rather a software/hardware problem?

Thanks a lot.

Regards
Iryna
4 REPLIES 4
Olivier
Pyrite | Level 9
Hi Iryna.
I don't think you really need all these 1500 variables to be used in the model, do you ?
So I'd rather use both SELECTION=FORWARD and STOP=50 to see which variables are the (at most) fifty best-contributing to your model, and then rerun the model with them...

Regards
Olivier
1162
Calcite | Level 5
Are any of these 1,500 variables highly correlated? If so, you might be able to select one among a group of highly correlated variables or use a small number of principal components (from a Principal Components Analysis) for your logistic regression.
datalligence
Fluorite | Level 6
iryna, i think you don't need all 150000 records/observations either.

for example, if you are interested in variables that rate respondents' ratings of certain job attributes, you may want to use the data for employed respondents only.
Matthew_Galati
SAS Employee
Is this question related to Mathematical Optimization and Operations Research with SAS? If not, this is the wrong forum.

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

Discussion stats
  • 4 replies
  • 1386 views
  • 0 likes
  • 5 in conversation