Dear all,
I am developing a predictive model for a data-set that has very imbalanced dependent variable. The ratio between the two categories of the dependent variable is 47500:1. I am exploring SMOTE sampling and adaptive synthetic sampling techniques before fitting these models to correct for the bias created by the imbalance. I mostly use SAS eguide but also comfortable with SAS enterprise miner. Has anyone used these sampling algorithms in SAS? I would appreciate assistance regarding coding these sampling techniques, I would also be happy if anyone would recommend any classification technique/s that would fit this problem. thanks in Advance.
regards
SMOTE described here
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
and ZIP containing SAS code
http://support.sas.com/resources/papers/proceedings15/3282-2015.zip
Two recent SAS papers from customers deal with / apply SMOTE.
SMOTE = Synthetic Minority Over-sampling TEchnique
Paper 3483-2015
Data sampling improvement by developing SMOTE technique in SAS
Lina Guzman, DIRECTV
http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf
Paper 3282-2015
A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™
Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect
A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
Koen
SMOTE described here
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
and ZIP containing SAS code
http://support.sas.com/resources/papers/proceedings15/3282-2015.zip
Two recent SAS papers from customers deal with / apply SMOTE.
SMOTE = Synthetic Minority Over-sampling TEchnique
Paper 3483-2015
Data sampling improvement by developing SMOTE technique in SAS
Lina Guzman, DIRECTV
http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf
Paper 3282-2015
A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™
Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect
A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.
http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf
Koen
thanks Koen for the papers, will go through them.
Hello,
I haven't tested the code accompanying the paper.
It's best to turn to the authors.
See the last page of the paper. It says:
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Cheers,
Koen
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.