turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Logistic Regression using CATMOD

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-16-2011 07:03 AM

Dear all, I'm having some trouble using CATMOD instead of Proc Logistic. Normally I would use Proc Logistic, but it appears that it crashes when the amount of variables grows to large, meanwhile CATMOD remains stable and fast. So I must use CATMOD instead.

Now I have the following code:

---

PROC LOGISTIC DATA=My_Data

PLOTS(ONLY)=ALL

;

CLASS GNDR (PARAM=EFFECT) BLGETMG (PARAM=EFFECT) EDULVL (PARAM=EFFECT) EDULVLM (PARAM=EFFECT) EDULVLF (PARAM=EFFECT) EDCTN (PARAM=EFFECT) Head_Unemployed (PARAM=EFFECT) DSBLD (PARAM=EFFECT) RTRD (PARAM=EFFECT) Number_Of_Children_SEC (PARAM=EFFECT) HHMMB (PARAM=EFFECT);

WEIGHT Weight_Household_2009;

MODEL In_Poverty (Event = '1')= GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB

/

LINK=LOGIT

ALPHA=0.10

;

RUN;

proc catmod data=My_Data;

response clogits;

model In_Poverty (Event = '1') = GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB / alpha=0.10;

weight Weight_Household_2009;

run;

---

BUT running both of these Logstic Regression provides different estimates for my parameters and I don't understand why. The variables are exactly the same, and the link functions should match. But still they give me different results. Why?

- Julian.

Now I have the following code:

---

PROC LOGISTIC DATA=My_Data

PLOTS(ONLY)=ALL

;

CLASS GNDR (PARAM=EFFECT) BLGETMG (PARAM=EFFECT) EDULVL (PARAM=EFFECT) EDULVLM (PARAM=EFFECT) EDULVLF (PARAM=EFFECT) EDCTN (PARAM=EFFECT) Head_Unemployed (PARAM=EFFECT) DSBLD (PARAM=EFFECT) RTRD (PARAM=EFFECT) Number_Of_Children_SEC (PARAM=EFFECT) HHMMB (PARAM=EFFECT);

WEIGHT Weight_Household_2009;

MODEL In_Poverty (Event = '1')= GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB

/

LINK=LOGIT

ALPHA=0.10

;

RUN;

proc catmod data=My_Data;

response clogits;

model In_Poverty (Event = '1') = GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB / alpha=0.10;

weight Weight_Household_2009;

run;

---

BUT running both of these Logstic Regression provides different estimates for my parameters and I don't understand why. The variables are exactly the same, and the link functions should match. But still they give me different results. Why?

- Julian.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

02-16-2011 07:04 AM

Correction to the CATMOD code:

---

proc catmod data=My_Data;

response clogits;

model In_Poverty = GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB / alpha=0.10;

weight Weight_Household_2009;

run;

---'

- Julian.

---

proc catmod data=My_Data;

response clogits;

model In_Poverty = GNDR BLGETMG EDULVL EDULVLM EDULVLF EDCTN Head_Unemployed DSBLD RTRD Number_Of_Children_SEC HHMMB / alpha=0.10;

weight Weight_Household_2009;

run;

---'

- Julian.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

02-16-2011 07:51 AM

This is a guess. Repeat, this is only a guess. (Actually, three guesses).

1. PROC LOGISTIC and PROC CATMOD use very different parameterizations for class variables. Could this be the source of your differences?

2. A second guess is that PROC LOGISTIC uses a maximum likelihood algorithm and CATMOD a weighted least squares. That could also lead to differences.

3. Finally, it may be that you need to specify the class variables in a DIRECT statement in CATMOD. We are now beyond my experience level.

I hope at least one of these leads you to some resolution.

Steve Denham

1. PROC LOGISTIC and PROC CATMOD use very different parameterizations for class variables. Could this be the source of your differences?

2. A second guess is that PROC LOGISTIC uses a maximum likelihood algorithm and CATMOD a weighted least squares. That could also lead to differences.

3. Finally, it may be that you need to specify the class variables in a DIRECT statement in CATMOD. We are now beyond my experience level.

I hope at least one of these leads you to some resolution.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

02-16-2011 12:15 PM

Thank you for the fast reply Steve.

However I fixed the problem myself. Turns out Proc Logistic and Proc CATMOD are both parametrized the same way in my code, they both use maximum likelihood and the direct statement is only for continous variables in CATMOD.... BUT they optimize differently, and treat "bad variables" (estimates that go to infinity etc...) very differently. Once I removed all these bad variables the estimates matched to a very close numerical precision. It also appears that the numerical differences between CATMOD and Logistic approach zero as the number of distinct cases for each class variable increases.

I've posted my code again in case others run into the same problems:

---

PROC LOGISTIC DATA=SASUSER.FILTER_FOR_CON_DATA_FORMATT_0005

PLOTS(ONLY)=ALL

;

CLASS GNDR (PARAM=EFFECT) Head_Unemployed (PARAM=EFFECT) Number_Of_Children_SEC (PARAM=EFFECT) HHMMB (PARAM=EFFECT) ;

WEIGHT Weight_Household_2009;

MODEL In_Poverty (Event = '1')= GNDR Head_Unemployed Number_Of_Children_SEC HHMMB

/

LINK=logit

ALPHA=0.10

;

RUN;

proc catmod data=SASUSER.FILTER_FOR_CON_DATA_FORMATT_0005;

response clogits;

model In_Poverty = GNDR Head_Unemployed Number_Of_Children_SEC HHMMB / alpha=0.10;

weight Weight_Household_2009;

run;

---

However I fixed the problem myself. Turns out Proc Logistic and Proc CATMOD are both parametrized the same way in my code, they both use maximum likelihood and the direct statement is only for continous variables in CATMOD.... BUT they optimize differently, and treat "bad variables" (estimates that go to infinity etc...) very differently. Once I removed all these bad variables the estimates matched to a very close numerical precision. It also appears that the numerical differences between CATMOD and Logistic approach zero as the number of distinct cases for each class variable increases.

I've posted my code again in case others run into the same problems:

---

PROC LOGISTIC DATA=SASUSER.FILTER_FOR_CON_DATA_FORMATT_0005

PLOTS(ONLY)=ALL

;

CLASS GNDR (PARAM=EFFECT) Head_Unemployed (PARAM=EFFECT) Number_Of_Children_SEC (PARAM=EFFECT) HHMMB (PARAM=EFFECT) ;

WEIGHT Weight_Household_2009;

MODEL In_Poverty (Event = '1')= GNDR Head_Unemployed Number_Of_Children_SEC HHMMB

/

LINK=logit

ALPHA=0.10

;

RUN;

proc catmod data=SASUSER.FILTER_FOR_CON_DATA_FORMATT_0005;

response clogits;

model In_Poverty = GNDR Head_Unemployed Number_Of_Children_SEC HHMMB / alpha=0.10;

weight Weight_Household_2009;

run;

---