BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fabio
Calcite | Level 5

Hi,

I need to oversample in Enterprise Miner with a rare event fixed proportion.

Starting from a database with a 0.5% of rare event, I need to specify the proportion of rare event in the sample. 50:50 is not ok, because database is too small in this case.

Thank you very much.

F

1 ACCEPTED SOLUTION

Accepted Solutions
WarrenSarle
SAS Employee

Fabio, use the "Level Based" options in the Sampling node as demonstrated in the video that Miguel mentioned. You can set any sampling proportions that you want.

Also see the section on "Detecting Rare Classes" in the documentation for Enterprise Miner.

View solution in original post

6 REPLIES 6
M_Maldonado
Barite | Level 11

Hi Fabio,

This usage note details the steps on how to model a rare event using oversampling.

Link here: 24205 - Rare event oversampling for model fitting in SAS® Enterprise Miner(tm)

It includes a video too!

Thanks,

Miguel

Fabio
Calcite | Level 5

Thank you, Miguel.

The note help to create a 50-50 sample. But the proportion of rare event is to big for me in this sample. I need a sample with a 10% of rare event proportion.

Is it possible?

WarrenSarle
SAS Employee

Fabio, use the "Level Based" options in the Sampling node as demonstrated in the video that Miguel mentioned. You can set any sampling proportions that you want.

Also see the section on "Detecting Rare Classes" in the documentation for Enterprise Miner.

Fabio
Calcite | Level 5

Thank you everybody for your precious help

M_Maldonado
Barite | Level 11

Fabio,

I added a detailed example this morning on how to adjust probabilities for a 50/50 oversample using a Decisions node.

Take a look here:.

I hope it helps,

Miguel

jsienna
Calcite | Level 5

Fabio

I am not sure if it will be of some help, but you may take a look at this paper:

http://gking.harvard.edu/files/0s.pdf

The authors studies the opposite of the problem you are trying to find a solution. They suggest an adjustment procedure for the data set with huge number observations but relatively few events. Nevertheless, you may be able to implement the same sampling design principle in your data set and estimation procedure as the authors suggest. I have used their methodology for large data sets with relatively low number of cases, and it is a quite effective and slick approach.   

Good luck.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 8267 views
  • 7 likes
  • 4 in conversation