03-17-2016 04:09 PM
I am developing a predictive model for a data-set that has very imbalanced dependent variable. The ratio between the two categories of the dependent variable is 47500:1. I am exploring SMOTE sampling and adaptive synthetic sampling techniques before fitting these models to correct for the bias created by the imbalance. I mostly use SAS eguide but also comfortable with SAS enterprise miner. Has anyone used these sampling algorithms in SAS? I would appreciate assistance regarding coding these sampling techniques, I would also be happy if anyone would recommend any classification technique/s that would fit this problem. thanks in Advance.
03-17-2016 06:03 PM
SMOTE described here
and ZIP containing SAS code
Two recent SAS papers from customers deal with / apply SMOTE.
SMOTE = Synthetic Minority Over-sampling TEchnique
Data sampling improvement by developing SMOTE technique in SAS
Lina Guzman, DIRECTV
A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™
Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect
A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.
03-23-2016 08:32 AM
03-24-2016 12:32 PM - edited 03-24-2016 12:33 PM
I haven't tested the code accompanying the paper.
It's best to turn to the authors.
See the last page of the paper. It says:
Your comments and questions are valued and encouraged. Contact the author at: