BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NicolasC
Fluorite | Level 6

Hi there

 

I am trying to build a classifier with miner and my issue comes from unbalanced data. My dataset is made of 109,194 records, from which 1379 have a target=1 and the remaining 107815 have a target=0, leading to a 98.74%/1.26% ratio. My 30 predictors are all numeric.

I have tested three way to handle this unbalanced data: first one, I do no sample at all as per the following diagram

method1 (raw)method1 (raw)

Second one I over sample the minority class 1 to represent about 30% of the dataset using the Sampling node (criterion property set a level-based) 

method2 (over sampling)method2 (over sampling)

Last one, I do not over sample but change the values in the diagonal in the Decision weight tabs form the Input Node option and put as a weight for the rare event the ratio of probability of common event / rare event, namely 98.74/1.26=78.36.

method3 (Decision Weights)method3 (Decision Weights)

The results are as follow 

Method1 resultsMethod1 resultsMethod2 resultsMethod2 resultsMethod3 resultsMethod3 results

I do not find the results tremendously convincing (and still confused as why false/true positive are non integer for method2). Am I doing anything wrong? I know there i a lot bout unbalanced data but I do not seem to find a way to apply any solution to my case. Thanks

Nicolas

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

Hi Nicolas,

Maybe this thread can help you while someone takes a second look into what you did?

 

https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/Oversampling-in-Enterprise-Miner-with-a-r...

 

When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself 🙂
Would that be an option for you?

Best,
-Miguel

View solution in original post

1 REPLY 1
M_Maldonado
Barite | Level 11

Hi Nicolas,

Maybe this thread can help you while someone takes a second look into what you did?

 

https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/Oversampling-in-Enterprise-Miner-with-a-r...

 

When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself 🙂
Would that be an option for you?

Best,
-Miguel

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 4708 views
  • 0 likes
  • 2 in conversation