Programming the statistical procedures from SAS

Assistance with SAS code for SMOTE and adaptive synthetic sampling algorithms

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Assistance with SAS code for SMOTE and adaptive synthetic sampling algorithms

Dear all,

I am developing a predictive model for a data-set that has very imbalanced dependent variable. The ratio between the two categories of the dependent variable is 47500:1. I am exploring SMOTE sampling and adaptive synthetic sampling techniques before fitting these models to correct for the bias created by the imbalance. I mostly use SAS eguide but also comfortable with SAS enterprise miner. Has anyone used these sampling algorithms in SAS? I would appreciate assistance regarding coding these sampling techniques, I would also be happy if anyone would recommend any classification technique/s that would fit this problem. thanks in Advance. 

regards


Accepted Solutions
Solution
‎07-03-2017 09:52 AM
SAS Employee
Posts: 51

Re: Assistance with SAS code for SMOTE and adaptive synthetic sampling algorithms

SMOTE described here

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

and ZIP containing SAS code

http://support.sas.com/resources/papers/proceedings15/3282-2015.zip  

 

Two recent SAS papers from customers deal with / apply SMOTE.

SMOTE = Synthetic Minority Over-sampling TEchnique

 

Paper 3483-2015

Data sampling improvement by developing SMOTE technique in SAS

Lina Guzman, DIRECTV

http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf

 

Paper 3282-2015

A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™

Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect

A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

 

Koen

 

View solution in original post


All Replies
Solution
‎07-03-2017 09:52 AM
SAS Employee
Posts: 51

Re: Assistance with SAS code for SMOTE and adaptive synthetic sampling algorithms

SMOTE described here

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

and ZIP containing SAS code

http://support.sas.com/resources/papers/proceedings15/3282-2015.zip  

 

Two recent SAS papers from customers deal with / apply SMOTE.

SMOTE = Synthetic Minority Over-sampling TEchnique

 

Paper 3483-2015

Data sampling improvement by developing SMOTE technique in SAS

Lina Guzman, DIRECTV

http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf

 

Paper 3282-2015

A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™

Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect

A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

 

Koen

 

New Contributor
Posts: 3

Re: Assistance with SAS code for SMOTE and adaptive synthetic sampling algorithms

thanks Koen for the papers, will go through them.

New Contributor
Posts: 3

Re: Assistance with SAS code for SMOTE and adaptive synthetic sampling algorithms

Hi Koen,

I am testing the SAS code on my dataset. when i get to the data _NULL_ step (Generating random cases with look up table) all the new cases generated from this step have missing observations. is this unusual? how do i go about fixing it? thanks.
SAS Employee
Posts: 51

Re: Assistance with SAS code for SMOTE and adaptive synthetic sampling algorithms

[ Edited ]

Hello,

 

I haven't tested the code accompanying the paper.

It's best to turn to the authors.

 

See the last page of the paper. It says:

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

 

Cheers,

Koen

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 2226 views
  • 2 likes
  • 2 in conversation