Dear all,
I am developing a predictive model for a data-set that has very imbalanced dependent variable. The ratio between the two categories of the dependent variable is 47500:1. I am exploring SMOTE sampling and adaptive synthetic sampling techniques before fitting these models to correct for the bias created by the imbalance. I mostly use SAS eguide but also comfortable with SAS enterprise miner. Has anyone used these sampling algorithms in SAS? I would appreciate assistance regarding coding these sampling techniques, I would also be happy if anyone would recommend any classification technique/s that would fit this problem. thanks in Advance.
regards
SMOTE described here
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
and ZIP containing SAS code
http://support.sas.com/resources/papers/proceedings15/3282-2015.zip
Two recent SAS papers from customers deal with / apply SMOTE.
SMOTE = Synthetic Minority Over-sampling TEchnique
Paper 3483-2015
Data sampling improvement by developing SMOTE technique in SAS
Lina Guzman, DIRECTV
http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf
Paper 3282-2015
A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™
Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect
A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
Koen
SMOTE described here
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
and ZIP containing SAS code
http://support.sas.com/resources/papers/proceedings15/3282-2015.zip
Two recent SAS papers from customers deal with / apply SMOTE.
SMOTE = Synthetic Minority Over-sampling TEchnique
Paper 3483-2015
Data sampling improvement by developing SMOTE technique in SAS
Lina Guzman, DIRECTV
http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf
Paper 3282-2015
A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™
Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect
A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
Koen
thanks Koen for the papers, will go through them.
Hello,
I haven't tested the code accompanying the paper.
It's best to turn to the authors.
See the last page of the paper. It says:
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Cheers,
Koen
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.