I’m conducting logistic regression using proc logistic on the sample consisting of approximately 150000 people described by 1500 variables. The analysis lasts for about 8 hours. Do you know if there is any methodical way to speed it up? Or is it rather a software/hardware problem?
Hi Iryna.
I don't think you really need all these 1500 variables to be used in the model, do you ?
So I'd rather use both SELECTION=FORWARD and STOP=50 to see which variables are the (at most) fifty best-contributing to your model, and then rerun the model with them...
Are any of these 1,500 variables highly correlated? If so, you might be able to select one among a group of highly correlated variables or use a small number of principal components (from a Principal Components Analysis) for your logistic regression.
iryna, i think you don't need all 150000 records/observations either.
for example, if you are interested in variables that rate respondents' ratings of certain job attributes, you may want to use the data for employed respondents only.
Is this question related to Mathematical Optimization and Operations Research with SAS? If not, this is the wrong forum.
2025 SAS Hackathon: There is still time!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!