BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bncoxuk
Obsidian | Level 7

I plan to use PROC CATMOD to fit a multinomial logit models. But it seemed that this procedure does not have the automatic variable selection method (e.g. stepwise, forward). So it takes time to manually exclude a variable. the problem is that the data set has over 5 million cases. It takes over 3 hours to run the model, and then wait there Smiley Sad So to finish the model, it probably needs to take a few days with waiting.

Please advise.

1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The intro to the LOGISTIC procedure (User's Guide), and example 51.4 (v9.2) deal with generalized logits (not ordinal) and the multinomial problem. But I am guessing that you have a different modeling approach in mind. In that case, there is no variable selection method. But as I wrote above, I would be cautious of any automatic variable-selection method.

View solution in original post

4 REPLIES 4
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Instead of CATMOD, you could use PROC LOGISTIC for stepwise variable selection. I assume the selection= option on the model statement works for multinomial logit models in LOGISTIC. Note: automatic selection methods are controversial, and can be very misleading. At best, you should use the results as an exploratory guide -- never just accept the final (automatic) variables that are selected.

For some exploratory work, you could make faster progress by considering a random sample of your 5 million observations, and using the sample for some modeling (at least initially). I am guessing that fits will occur much faster with 1 million or 500,000 observations.

bncoxuk
Obsidian | Level 7

Hi Ivm,

LOGISTIC cannot run Multinomial logit models. If it does, then the model means cumulative logit model which is different from multinomial one.

trekvana
Calcite | Level 5

bncoxuk-

proc logit can run multinomial models. use link=glogit in the model statement options

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The intro to the LOGISTIC procedure (User's Guide), and example 51.4 (v9.2) deal with generalized logits (not ordinal) and the multinomial problem. But I am guessing that you have a different modeling approach in mind. In that case, there is no variable selection method. But as I wrote above, I would be cautious of any automatic variable-selection method.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1427 views
  • 0 likes
  • 3 in conversation