Obsidian | Level 7

## Does Event rate matter if you have enough Events volume- Logistic Reg

Hi All,

Another question regarding logistic regression.

If you have enough volume of 'Events' does that matter how small your 'event rate' is ? I have data where Even rate is 1.3% but I have around 16K number of events in around 1.2m observations. I understand my even rate is very low but I think its enough volume to test around 20 variables in the logistic reg?

I have read few articles but honestly I am still confused if there is need to do any sampling. Many articles say if you have less than 10% even rate, you should consider oversampling but I think that is required only if you have less volume of events? Any thoughts?

Thanks

Sachin

5 REPLIES 5
Super User

## Re: Does Event rate matter if you have enough Events volume- Logistic Reg

```1) try Exact Logistic Regression + Monte Carlo
2) Decision Tree / Random Forest
3)try Poisson Regression :

when p~0
log(p/(1-p)) ~ log(p) = log(r/n)  --> log(r)-log(n)  and move log(n) into right of model.

http://support.sas.com/kb/24/188.html

```
Obsidian | Level 7

## Re: Does Event rate matter if you have enough Events volume- Logistic Reg

Sorry, I tried to read this but it did not ans my question. If its about SAS showing significant results or confidence intervals, then yes my results are showing that.

What I am trying to understand is:

If I have enough volume of Events does the event rate matter? I have 16K Event out of 1.2 million obs and event rate of approx 1.3%.

Super User

## Re: Does Event rate matter if you have enough Events volume- Logistic Reg

```No. It is not good for Logistic Model because event rate is too low .
So it doesn't matter how many obs    you have.

```
Obsidian | Level 7

## Re: Does Event rate matter if you have enough Events volume- Logistic Reg

With all respect I don't think this is true. I do think that based on the number of independent variables I am testing (20 in this case), if I have enough volume of events (10k in this case), low event rate does not matter because there are enough events for each variable. 'I think' it is fine in case of Logistic Regression but not sure about other algorithms.

Can you share any link/source which says and explains that event rate has to be of a certain level for logistic regression irrespective of volume of events?

Again these are just my views so looking forward for someone else reply as well.
Super User

## Re: Does Event rate matter if you have enough Events volume- Logistic Reg

```Hi, actually I am not expert about statistic .
proc logistic model the probability of event ,not the number of event.
It is called overdispersion .
In the sas documentation of logistic has described it.

Overdispersion
For a correctly specified model, the Pearson chi-square statistic and the deviance, divided by their degrees of
freedom, should be approximately equal to one. When their values are much larger than one, the assumption
of binomial variability might not be valid and the data are said to exhibit overdispersion. Underdispersion,
which results in the ratios being less than one, occurs less often in practice.
When fitting a model, there are several problems that can cause the goodness-of-fit statistics to exceed their
degrees of freedom. Among these are such problems as outliers in the data, using the wrong link function,
omitting important terms from the model, and needing to transform some predictors. These problems should
be eliminated before proceeding to use the following methods to correct for overdispersion.

```
Discussion stats
• 5 replies
• 1569 views
• 1 like
• 2 in conversation