BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mnasila
Calcite | Level 5

Dear all,

I am developing a predictive model for a data-set that has very imbalanced dependent variable. The ratio between the two categories of the dependent variable is 47500:1. I am exploring SMOTE sampling and adaptive synthetic sampling techniques before fitting these models to correct for the bias created by the imbalance. I mostly use SAS eguide but also comfortable with SAS enterprise miner. Has anyone used these sampling algorithms in SAS? I would appreciate assistance regarding coding these sampling techniques, I would also be happy if anyone would recommend any classification technique/s that would fit this problem. thanks in Advance. 

regards

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

SMOTE described here

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

and ZIP containing SAS code

http://support.sas.com/resources/papers/proceedings15/3282-2015.zip  

 

Two recent SAS papers from customers deal with / apply SMOTE.

SMOTE = Synthetic Minority Over-sampling TEchnique

 

Paper 3483-2015

Data sampling improvement by developing SMOTE technique in SAS

Lina Guzman, DIRECTV

http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf

 

Paper 3282-2015

A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™

Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect

A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

 

Koen

 

View solution in original post

4 REPLIES 4
sbxkoenk
SAS Super FREQ

SMOTE described here

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

and ZIP containing SAS code

http://support.sas.com/resources/papers/proceedings15/3282-2015.zip  

 

Two recent SAS papers from customers deal with / apply SMOTE.

SMOTE = Synthetic Minority Over-sampling TEchnique

 

Paper 3483-2015

Data sampling improvement by developing SMOTE technique in SAS

Lina Guzman, DIRECTV

http://support.sas.com/resources/papers/proceedings15/3483-2015.pdf

 

Paper 3282-2015

A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™

Ruizhe Wang, GuideWell Connect; Novik Lee, GuideWell Connect; Yun Wei, GuideWell Connect

A rather novel technique called SMOTE (Synthetic Minority Over-sampling TEchnique), which has achieved the best result in our comparison, is discussed.

http://support.sas.com/resources/papers/proceedings15/3282-2015.pdf

 

Koen

 

mnasila
Calcite | Level 5

thanks Koen for the papers, will go through them.

mnasila
Calcite | Level 5
Hi Koen,

I am testing the SAS code on my dataset. when i get to the data _NULL_ step (Generating random cases with look up table) all the new cases generated from this step have missing observations. is this unusual? how do i go about fixing it? thanks.
sbxkoenk
SAS Super FREQ

Hello,

 

I haven't tested the code accompanying the paper.

It's best to turn to the authors.

 

See the last page of the paper. It says:

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

 

Cheers,

Koen

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 11007 views
  • 2 likes
  • 2 in conversation