Help using Base SAS procedures

Logistic Regression using CATMOD

Reply
N/A
Posts: 0

Logistic Regression using CATMOD

Dear all, I'm having some trouble using CATMOD instead of Proc Logistic. Normally I would use Proc Logistic, but it appears that it crashes when the amount of variables grows to large, meanwhile CATMOD remains stable and fast. So I must use CATMOD instead.

Now I have the following code:
---

PROC LOGISTIC DATA=My_Data
PLOTS(ONLY)=ALL

;
CLASS GNDR (PARAM=EFFECT) BLGETMG (PARAM=EFFECT) EDULVL (PARAM=EFFECT) EDULVLM (PARAM=EFFECT) EDULVLF (PARAM=EFFECT) EDCTN (PARAM=EFFECT) Head_Unemployed (PARAM=EFFECT) DSBLD (PARAM=EFFECT) RTRD (PARAM=EFFECT) Number_Of_Children_SEC (PARAM=EFFECT) HHMMB (PARAM=EFFECT);
WEIGHT Weight_Household_2009;
MODEL In_Poverty (Event = '1')= GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB
/
LINK=LOGIT
ALPHA=0.10
;
RUN;

proc catmod data=My_Data;
response clogits;
model In_Poverty (Event = '1') = GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB / alpha=0.10;
weight Weight_Household_2009;
run;
---

BUT running both of these Logstic Regression provides different estimates for my parameters and I don't understand why. The variables are exactly the same, and the link functions should match. But still they give me different results. Why?

- Julian.
N/A
Posts: 0

Re: Logistic Regression using CATMOD

Correction to the CATMOD code:
---
proc catmod data=My_Data;
response clogits;
model In_Poverty = GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB / alpha=0.10;
weight Weight_Household_2009;
run;
---'

- Julian.
Respected Advisor
Posts: 2,655

Re: Logistic Regression using CATMOD

This is a guess. Repeat, this is only a guess. (Actually, three guesses).
1. PROC LOGISTIC and PROC CATMOD use very different parameterizations for class variables. Could this be the source of your differences?
2. A second guess is that PROC LOGISTIC uses a maximum likelihood algorithm and CATMOD a weighted least squares. That could also lead to differences.
3. Finally, it may be that you need to specify the class variables in a DIRECT statement in CATMOD. We are now beyond my experience level.

I hope at least one of these leads you to some resolution.

Steve Denham
N/A
Posts: 0

Re: Logistic Regression using CATMOD

Thank you for the fast reply Steve.

However I fixed the problem myself. Turns out Proc Logistic and Proc CATMOD are both parametrized the same way in my code, they both use maximum likelihood and the direct statement is only for continous variables in CATMOD.... BUT they optimize differently, and treat "bad variables" (estimates that go to infinity etc...) very differently. Once I removed all these bad variables the estimates matched to a very close numerical precision. It also appears that the numerical differences between CATMOD and Logistic approach zero as the number of distinct cases for each class variable increases.

I've posted my code again in case others run into the same problems:
---
PROC LOGISTIC DATA=SASUSER.FILTER_FOR_CON_DATA_FORMATT_0005
PLOTS(ONLY)=ALL

;
CLASS GNDR (PARAM=EFFECT) Head_Unemployed (PARAM=EFFECT) Number_Of_Children_SEC (PARAM=EFFECT) HHMMB (PARAM=EFFECT) ;
WEIGHT Weight_Household_2009;
MODEL In_Poverty (Event = '1')= GNDR Head_Unemployed Number_Of_Children_SEC HHMMB
/
LINK=logit
ALPHA=0.10
;
RUN;

proc catmod data=SASUSER.FILTER_FOR_CON_DATA_FORMATT_0005;
response clogits;
model In_Poverty = GNDR Head_Unemployed Number_Of_Children_SEC HHMMB / alpha=0.10;
weight Weight_Household_2009;
run;
---
Ask a Question
Discussion stats
  • 3 replies
  • 151 views
  • 0 likes
  • 2 in conversation