Statistical Procedures

jojo · Posted 03-09-2025 01:39 PM

I used FIRTH option in proc logistic and proc phreg as there is a zero event in one of the two groups and more than 60% events in another group. I got extreme values of OR and HR (greater than 100) and very wide CI and p-value <0.0001. Can I interpret the results as the same as for regular proc logistic and proc phreg? Any input/opinion based on your experience and knowledge would be appreciated.

Ksharp · Posted 03-09-2025 10:24 PM

That would be better if you could post your data and code and output.

For PROC LOGISTIC, you could check UNITS statement and set it bigger to suppress the extreme CI and P value.

The same thing to PROC PHREG:

jojo · Posted 03-10-2025 01:53 AM

Thanks Eric. I didn't use OddsRatio or HazardRatio statement, I calculated the OR and HR using the estimates from the model. Below is the code used.

Proc logistic data=xx;

class treatment(ref="PBO")/param=ref;

model aval(event="1") =treatment base/link=logic firth;

run;

proc phreg data=xx;

class treatment(ref="PBO");

model aval*cnsr(1) = treatment base/firth;

run;

Ksharp · Posted 03-10-2025 02:07 AM

Then Why not try UNITS statement for LOGISTIC and UNITS= option for the HAZARDRATIO statement of PROC PHREG ?
set it as big as you can .

jojo · Posted 03-10-2025 08:25 AM

I need to use proc mianalyze to get the pooled estimate based on the standard error from 200 imputed datasets and then calculate the OR/HR.

jojo · Posted 03-10-2025 09:06 AM

and OR and HR is for treatment group which only have two values. UNITS is for continuous independent variable.

StatDave · Posted 03-10-2025 05:06 PM

Assuming that your data set is small so that exact methods are feasible, try using the EXACT statement to see if you can get a better estimate of the odds ratio. Add this statement:

exact treatment / estimate=both;

Season · Posted 03-12-2025 03:14 AM

I am curious on a more technical issue: what is the difference between Firth's penalization and the exact method? After all, to the best of my knowledge, the former is suitable when (quasi-) complete separation exists while the latter is suitable for small samples, which can also lead to (quasi-) complete separation when empty cells exist. So what is the difference between them? Or are they just competitive alternatives with no obvious superiority of one method against the other?

StatDave · Posted 03-12-2025 10:12 AM

The two methods are very different. The Firth method is still an iterative maximum likelihood estimation method, with just a small tweak to add a penalty to the likelihood function. See the details in the "Details: Iterative Algorithms for Model Fitting" section of the LOGISTIC documentation. The exact method makes use of conditional methods through the generation of a conditional distribution and likelihood function. See the "Details: Exact conditional logistic regression". The exact method can be very computationally intensive and is generally feasible only with smaller, simpler data sets. But practically speaking, yes, they are both methods that are frequently used to deal with the problems occurring with sparse data. And the Firth method is less computationally challenging so is often more feasible. However, as with all iterative methods, both of these methods can fail depending on the data and model.

Season · Posted 03-12-2025 12:07 PM

Thank you so much, for your detailed and patient response! I tried searching for papers comparing the methods online yet ended up finding little useful information. Your concise comparison is very informative and fruitful.

Based upon my findings online, I think a field of further research on this issue is the empirical (i.e., simulation-based) and theoretical (i.e., mathematics-based) comparison of the accuracy and (or) efficiency of the estimators formed by the two methods. You mentioned that exact logistic regression is a computationally extensive method, yet this deficiency can be (partially) overcome by supercomputers that are becoming increasingly accessible to industries. Therefore, the comparison of the two methods is becoming an issue worth detailed investigation. I hope more statisticians can read my appeal and realize it.

Season · Posted 03-12-2025 11:54 AM

I think this phenomenon is largely caused by the small sample size and its resultant wide confidence interval. You can try the exact logistic regression method as suggested by @StatDave to see if it is helpful in its alleviation, but I am pessimistic on whether the mere change of method will have any impact at all. The best solution to gaining a narrower confidence interval and a resultant less extreme OR or HR value is to increase your sample size. If that is impossible, then I think you should report the results as they are.

jojo · Posted 03-12-2025 04:35 PM

Thanks all for your input and comments. I would use FIRTH method as I will need to run hundreds of models and combine the results.

Season · Posted 03-12-2025 09:04 PM

I saw your post above that your dataset contained missing data and you were to repetitively go through the modeling process on each imputed sample.

An informal approach I can think of is to use the Firth's method and exact logistic regression on one (or several) of the imputed dataset and see how the results differ. If this tentative modeling process does not find any big difference between them, then we might speculate that the differences in each of the unanalyzed imputed dataset is not big either. Therefore, the combined estimate, which, according to Rubin's rules, is in fact the arithmetic mean of the regression coefficients calculated from each imputed sample, is not significantly affected by the selection of estimating method (i.e., Firth's method or exact logistic regression).

Statistical Procedures

Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Re: Extreme value of OR and HR and CI with FIRTH option in proc logistic and proc phreg

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...