BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KlaasFrankena
Fluorite | Level 6

I wonder why the odds ratio estimated by proc logistic differs from the one obtained from proc freq and proc genmod in the example below (SAS 9.4 TS Level 1M5). The OR's become identical when the low frequency event=0/factor=1 is set to above 20. Using logistic regression in STATA on this data gives the same OR as the one from proc freq and proc genmod. I would like to know why proc logistic gives a slightly different OR (I know exact logistic regression should be used on sparse data). 

 

data check;
do i=1 to 52; event=1; factor=1; output; end;
do i=1 to 287; event=1; factor=0; output; end;
do i=1 to 1; event=0; factor=1; output; end;
do i=1 to 385; event=0; factor=0; output; end;
run;

proc logistic data=check descending;
class factor/param=ref ref=first;
model event=factor;
run;
proc freq data=check;
tables event*factor/cmh;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Very interesting . When you using PROC HPLOGISTIC , you could get right coefficient estimator 4.245 .

proc hplogistic data=check ;
model event(event='1')=factor ;
run;

I think the reason proc hplogistic is using
Optimization Technique Newton-Raphson with Ridging


While proc logistic
proc logistic data=check ;
model event(event='1')=factor ;
run;

is using Optimization Technique Fisher's scoring .

View solution in original post

6 REPLIES 6
KlaasFrankena
Fluorite | Level 6
And this is the proc genmod:

proc genmod data=check descending;
class factor/param=ref ref=first;
model event=factor/dist=bin link=logit;
estimate 'OR' factor 1/exp;
run;
Ksharp
Super User
Very interesting . When you using PROC HPLOGISTIC , you could get right coefficient estimator 4.245 .

proc hplogistic data=check ;
model event(event='1')=factor ;
run;

I think the reason proc hplogistic is using
Optimization Technique Newton-Raphson with Ridging


While proc logistic
proc logistic data=check ;
model event(event='1')=factor ;
run;

is using Optimization Technique Fisher's scoring .

KlaasFrankena
Fluorite | Level 6

Thanks! One can specify TECH=Newton as option in proc logistic's model statement, but it gives the same output as Fisher scoring. So, you really need 'Newton with ridging' to come to identical estimates. Good to know when small frequencies are involved!

Ksharp
Super User
Yeah. You are right. That is the reason you need PROC HPLOGISTIC to conduct the OR you need .
StatDave
SAS Super FREQ

If you add the ITPRINT option, you will see that the gradients are small but not extremely close to zero which would indicate proper convergence. You could be a little suspicious of the convergence because of the slightly large parameter estimate. If you then use a different convergence criterion by specifying GCONV=0 XCONV=1e-8, then the gradients are extremely small and the odds ratio estimate is the same as from PROC FREQ. Of course, using a different procedure similarly changes the iterative algorithm and could also provide more complete convergence.

KlaasFrankena
Fluorite | Level 6

Thanks, very helpful solution as well.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1016 views
  • 7 likes
  • 3 in conversation