Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
jojo
Obsidian | Level 7

I used FIRTH option in proc logistic and proc phreg as there is a zero event  in one of the two groups and more than 60% events in another group.  I got extreme values of OR and HR (greater than 100) and very wide CI and p-value <0.0001. Can I interpret the results as the same as for regular proc logistic and proc phreg? Any input/opinion based on your experience and knowledge would be appreciated.

12 REPLIES 12
Ksharp
Super User

That would be better if you could post your data and code and output.

 

For PROC LOGISTIC, you could check UNITS statement and set it bigger to suppress the extreme CI and P value.

 

Ksharp_0-1741573280073.png

 

 

The same thing to PROC PHREG:

Ksharp_1-1741573456177.png

 

jojo
Obsidian | Level 7

Thanks Eric. I didn't use OddsRatio or HazardRatio statement, I calculated the OR and HR using the estimates from the model. Below is the code used.

 

Proc logistic data=xx;

class treatment(ref="PBO")/param=ref;

model aval(event="1") =treatment base/link=logic firth;

run;

 

proc phreg data=xx;

class treatment(ref="PBO");

model aval*cnsr(1) = treatment base/firth;

run;

 

Ksharp
Super User
Then Why not try UNITS statement for LOGISTIC and UNITS= option for the HAZARDRATIO statement of PROC PHREG ?
set it as big as you can .
jojo
Obsidian | Level 7
I need to use proc mianalyze to get the pooled estimate based on the standard error from 200 imputed datasets and then calculate the OR/HR.
jojo
Obsidian | Level 7
and OR and HR is for treatment group which only have two values. UNITS is for continuous independent variable.
StatDave
SAS Super FREQ

Assuming that your data set is small so that exact methods are feasible, try using the EXACT statement to see if you can get a better estimate of the odds ratio. Add this statement:

  exact treatment / estimate=both;

Season
Barite | Level 11

I am curious on a more technical issue: what is the difference between Firth's penalization and the exact method? After all, to the best of my knowledge, the former is suitable when (quasi-) complete separation exists while the latter is suitable for small samples, which can also lead to (quasi-) complete separation when empty cells exist. So what is the difference between them? Or are they just competitive alternatives with no obvious superiority of one method against the other?

StatDave
SAS Super FREQ

The two methods are very different. The Firth method is still an iterative maximum likelihood estimation method, with just a small tweak to add a penalty to the likelihood function. See the details in the "Details: Iterative Algorithms for Model Fitting" section of the LOGISTIC documentation. The exact method makes use of conditional methods through the generation of a conditional distribution and likelihood function. See the "Details: Exact conditional logistic regression". The exact method can be very computationally intensive and is generally feasible only with smaller, simpler data sets. But practically speaking, yes, they are both methods that are frequently used to deal with the problems occurring with sparse data. And the Firth method is less computationally challenging so is often more feasible. However, as with all iterative methods, both of these methods can fail depending on the data and model.

Season
Barite | Level 11

Thank you so much, for your detailed and patient response! I tried searching for papers comparing the methods online yet ended up finding little useful information. Your concise comparison is very informative and fruitful.

Based upon my findings online, I think a field of further research on this issue is the empirical (i.e., simulation-based) and theoretical (i.e., mathematics-based) comparison of the accuracy and (or) efficiency of the estimators formed by the two methods. You mentioned that exact logistic regression is a computationally extensive method, yet this deficiency can be (partially) overcome by supercomputers that are becoming increasingly accessible to industries. Therefore, the comparison of the two methods is becoming an issue worth detailed investigation. I hope more statisticians can read my appeal and realize it.

Season
Barite | Level 11

I think this phenomenon is largely caused by the small sample size and its resultant wide confidence interval. You can try the exact logistic regression method as suggested by @StatDave to see if it is helpful in its alleviation, but I am pessimistic on whether the mere change of method will have any impact at all. The best solution to gaining a narrower confidence interval and a resultant less extreme OR or HR value is to increase your sample size. If that is impossible, then I think you should report the results as they are.

jojo
Obsidian | Level 7

Thanks all for your input and comments. I would use FIRTH method as I will need to run hundreds of models and combine the results. 

Season
Barite | Level 11

I saw your post above that your dataset contained missing data and you were to repetitively go through the modeling process on each imputed sample.

An informal approach I can think of is to use the Firth's method and exact logistic regression on one (or several) of the imputed dataset and see how the results differ. If this tentative modeling process does not find any big difference between them, then we might speculate that the differences in each of the unanalyzed imputed dataset is not big either. Therefore, the combined estimate, which, according to Rubin's rules, is in fact the arithmetic mean of the regression coefficients calculated from each imputed sample, is not significantly affected by the selection of estimating method (i.e., Firth's method or exact logistic regression).

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1168 views
  • 2 likes
  • 4 in conversation