BookmarkSubscribeRSS Feed
Ujjawal
Quartz | Level 8

I am building a marketing model based on logistic regression. It's a customer attrition model. The event rate is very less i.e 0.1%. I have more than 1000 predictors. I know there is a rule - Minimum 10 events per predictor. I want to know - Does this rule exist before dimensionality reduction (feature extraction) with PCA and Information value? Should i consider this rule based on my original 1500 variables or does it exist for significant variables that came after applying variable selection techniques such as Stepwise Regression , PCA etc?

14 REPLIES 14
Reeza
Super User

This is a commonly asked question on here, have none of the search results been useful?

Ujjawal
Quartz | Level 8

I understand this is a commonly asked question. But no one clarified the background of this rule. Does this rule consider correlated predictors? Should this rule apply before removing multicollinearity or after removing collinearity and feature extraction?

Reeza
Super User

What's your source for the 'rule'?

Reeza
Super User

My nickel - and it's my [somewhat]educated opinion.

I would argue that the 10 per rule of thumb isn't always valid, it depends on the variability of the variables being measured.

If you're not using the event rate in your dimensionality reduction and variable selection I would argue the 'rule' would apply to the variables after reduction.

If before then to original variables.

Ksharp
Super User

According to some statistical expert , You need run EXACT Logistic regression . Check EXACT statement in proc logistic , If I remembered correctly.

Rick_SAS
SAS Super FREQ

Most variable selection techniques start by evaluating all one-variable models, then trying to add a second variable, a third variable, and so forth.  If you want to conform to the 10-events-per-predictor rule, then you should not try to build models that have more than NumEvents / 10 predictors. For example, if you have 51 events, you could limit the selection algorithm to consider only models that have up to 5 continuous variables.

Ujjawal
Quartz | Level 8

Thanks Xia and Rick for your reply. I am aware of Exact and Firth Logistic Regression. I am curious to know the background of this rule.

@ Rick - Suppose i have 1000 predictors in my model. Do you mean to say -  It requires atleast 10k events before correcting for multicollinearity and feature extraction. I understand i can ignore this rule if i apply unsupervised learning (For e.g. PCA or PROC VARCLUS) as they are not related to dependent variable. I am more curious to know about supervised method to extract important variables. By supervised methods, i mean 'Information Value' and 'Chi-Square' methods. The model needs to have sufficient events for feature extractions? Otherwise the feature extraction would be biased. Correct?

SteveDenham
Jade | Level 19

Model building from 1000 predictors, using 'supervised methods' will be biased.  The question is how biased, and will the model adequately predict future data.  It is well known that naive methods lead to problematic results with standard regression models (stepwise, backward, forward, all possible subsets).  See Flom and Cassell's paper on Stopping Stepwise http://www.lexjansen.com/pnwsug/2008/DavidCassell-StoppingStepwise.pdf

The problem is exacerbated for logistic regression.  However, PROC HPGENSELECT in SAS/STAT14.1 does offer selection=LASSO which gets around a lot of the difficulties with the other methods.  Still, consider the result of putting things on a logit link, and what might happen with fewer than 10 events per predictor.  You are going to have some points with very small logits that have a lot of influence on the fit.

Steve Denham

Rick_SAS
SAS Super FREQ

No, I said that if you apply this rule, then in going from 1,000 potential explanatory variables to the k that you want in your final model, that the (number of events)/10  will bound the value for k. 

Ujjawal
Quartz | Level 8

"Thanks Steve and Rick. @ Rick - " then in going from 1,000 potential explanatory variables to the k that you want in your final model, that the (number of events)/10  will bound the value for k. " -  Would each of these 1000 variables have significant events to explain their variable importance? I suspect univariate analysis of these variables with dependent variable would fail. I am sorry to bug you again.

SteveDenham
Jade | Level 19

Here is a concrete example.  Suppose in your training dataset you have 10,000 records with an event rate of 0.1%.  That would be 10 events.  Using the bounded value for k of events/10, you could adequately fit 1 variable to the data.  If you had 20,000 records with the same event rate, you could adequately fit 2 variables, and so forth.

Of course, you will need additional records to validate your model against.

Steve Denham

Ujjawal
Quartz | Level 8

Thank you so much Steve for being so patient in replying this thread.🙂 My question still lies in your explanation. I understand i can fit only 2 variables with 20k records with an event rate of 0.1%. My question - can i perform INITIAL feature extraction (important variables selection with supervised methods) to come up with 2 FINAL significant variable? Or Do i need more events to perform initial feature extraction step?

Reeza
Super User

I think my original comment stands, if the feature extraction doesn't depend on the outcome you can use derived features as your variables - so can use 2 derived features with 20K records and an event rate of 0.1%.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 3068 views
  • 2 likes
  • 5 in conversation