BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mnasila
Calcite | Level 5

Dear all,

I am developing a predictive model for a data-set that has very imbalanced dependent variable. The ratio between the two categories of the dependent variable is 47500:1. I am exploring SMOTE sampling and adaptive synthetic sampling techniques before fitting these models to correct for the bias created by the imbalance. I mostly use SAS eguide but also comfortable with SAS enterprise miner. Has anyone used these sampling algorithms in SAS? I would appreciate assistance regarding coding these sampling techniques, I would also be happy if anyone would recommend any classification technique/s that would fit this problem. thanks in Advance. 

regards

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

SMOTE described here

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

and ZIP containing SAS code

http://support.sas.com/resources/papers/proceedings15/3282-2015.zip  

 

Two recent SAS papers from customers deal with / apply SMOTE.

SMOTE = Synthetic Minority Over-sampling TEchnique

 

Paper 3483-2015

Data sampling improvement by developing SMOTE technique in SAS

Lina Guzman, DIRECTV

http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf

 

Paper 3282-2015

A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™

Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect

A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

 

Koen

 

View solution in original post

4 REPLIES 4
sbxkoenk
SAS Super FREQ

SMOTE described here

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

and ZIP containing SAS code

http://support.sas.com/resources/papers/proceedings15/3282-2015.zip  

 

Two recent SAS papers from customers deal with / apply SMOTE.

SMOTE = Synthetic Minority Over-sampling TEchnique

 

Paper 3483-2015

Data sampling improvement by developing SMOTE technique in SAS

Lina Guzman, DIRECTV

http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf

 

Paper 3282-2015

A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™

Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect

A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

 

Koen

 

mnasila
Calcite | Level 5

thanks Koen for the papers, will go through them.

mnasila
Calcite | Level 5
Hi Koen,

I am testing the SAS code on my dataset. when i get to the data _NULL_ step (Generating random cases with look up table) all the new cases generated from this step have missing observations. is this unusual? how do i go about fixing it? thanks.
sbxkoenk
SAS Super FREQ

Hello,

 

I haven't tested the code accompanying the paper.

It's best to turn to the authors.

 

See the last page of the paper. It says:

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

 

Cheers,

Koen

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 11629 views
  • 2 likes
  • 2 in conversation