Dear all, I'm having some trouble using CATMOD instead of Proc Logistic. Normally I would use Proc Logistic, but it appears that it crashes when the amount of variables grows to large, meanwhile CATMOD remains stable and fast. So I must use CATMOD instead.
BUT running both of these Logstic Regression provides different estimates for my parameters and I don't understand why. The variables are exactly the same, and the link functions should match. But still they give me different results. Why?
This is a guess. Repeat, this is only a guess. (Actually, three guesses).
1. PROC LOGISTIC and PROC CATMOD use very different parameterizations for class variables. Could this be the source of your differences?
2. A second guess is that PROC LOGISTIC uses a maximum likelihood algorithm and CATMOD a weighted least squares. That could also lead to differences.
3. Finally, it may be that you need to specify the class variables in a DIRECT statement in CATMOD. We are now beyond my experience level.
I hope at least one of these leads you to some resolution.
However I fixed the problem myself. Turns out Proc Logistic and Proc CATMOD are both parametrized the same way in my code, they both use maximum likelihood and the direct statement is only for continous variables in CATMOD.... BUT they optimize differently, and treat "bad variables" (estimates that go to infinity etc...) very differently. Once I removed all these bad variables the estimates matched to a very close numerical precision. It also appears that the numerical differences between CATMOD and Logistic approach zero as the number of distinct cases for each class variable increases.
I've posted my code again in case others run into the same problems:
PROC LOGISTIC DATA=SASUSER.FILTER_FOR_CON_DATA_FORMATT_0005