- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I used FIRTH option in proc logistic and proc phreg as there is a zero event in one of the two groups and more than 60% events in another group. I got extreme values of OR and HR (greater than 100) and very wide CI and p-value <0.0001. Can I interpret the results as the same as for regular proc logistic and proc phreg? Any input/opinion based on your experience and knowledge would be appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That would be better if you could post your data and code and output.
For PROC LOGISTIC, you could check UNITS statement and set it bigger to suppress the extreme CI and P value.
The same thing to PROC PHREG:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Eric. I didn't use OddsRatio or HazardRatio statement, I calculated the OR and HR using the estimates from the model. Below is the code used.
Proc logistic data=xx;
class treatment(ref="PBO")/param=ref;
model aval(event="1") =treatment base/link=logic firth;
run;
proc phreg data=xx;
class treatment(ref="PBO");
model aval*cnsr(1) = treatment base/firth;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
set it as big as you can .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming that your data set is small so that exact methods are feasible, try using the EXACT statement to see if you can get a better estimate of the odds ratio. Add this statement:
exact treatment / estimate=both;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am curious on a more technical issue: what is the difference between Firth's penalization and the exact method? After all, to the best of my knowledge, the former is suitable when (quasi-) complete separation exists while the latter is suitable for small samples, which can also lead to (quasi-) complete separation when empty cells exist. So what is the difference between them? Or are they just competitive alternatives with no obvious superiority of one method against the other?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The two methods are very different. The Firth method is still an iterative maximum likelihood estimation method, with just a small tweak to add a penalty to the likelihood function. See the details in the "Details: Iterative Algorithms for Model Fitting" section of the LOGISTIC documentation. The exact method makes use of conditional methods through the generation of a conditional distribution and likelihood function. See the "Details: Exact conditional logistic regression". The exact method can be very computationally intensive and is generally feasible only with smaller, simpler data sets. But practically speaking, yes, they are both methods that are frequently used to deal with the problems occurring with sparse data. And the Firth method is less computationally challenging so is often more feasible. However, as with all iterative methods, both of these methods can fail depending on the data and model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much, for your detailed and patient response! I tried searching for papers comparing the methods online yet ended up finding little useful information. Your concise comparison is very informative and fruitful.
Based upon my findings online, I think a field of further research on this issue is the empirical (i.e., simulation-based) and theoretical (i.e., mathematics-based) comparison of the accuracy and (or) efficiency of the estimators formed by the two methods. You mentioned that exact logistic regression is a computationally extensive method, yet this deficiency can be (partially) overcome by supercomputers that are becoming increasingly accessible to industries. Therefore, the comparison of the two methods is becoming an issue worth detailed investigation. I hope more statisticians can read my appeal and realize it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think this phenomenon is largely caused by the small sample size and its resultant wide confidence interval. You can try the exact logistic regression method as suggested by @StatDave to see if it is helpful in its alleviation, but I am pessimistic on whether the mere change of method will have any impact at all. The best solution to gaining a narrower confidence interval and a resultant less extreme OR or HR value is to increase your sample size. If that is impossible, then I think you should report the results as they are.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks all for your input and comments. I would use FIRTH method as I will need to run hundreds of models and combine the results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I saw your post above that your dataset contained missing data and you were to repetitively go through the modeling process on each imputed sample.
An informal approach I can think of is to use the Firth's method and exact logistic regression on one (or several) of the imputed dataset and see how the results differ. If this tentative modeling process does not find any big difference between them, then we might speculate that the differences in each of the unanalyzed imputed dataset is not big either. Therefore, the combined estimate, which, according to Rubin's rules, is in fact the arithmetic mean of the regression coefficients calculated from each imputed sample, is not significantly affected by the selection of estimating method (i.e., Firth's method or exact logistic regression).