SAS Data Science

Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Viya (Machine Learning), SAS Visual Text Analytics, with point-and-click interfaces or programming
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NicolasC
Fluorite | Level 6

Hi there

 

I am trying to build a classifier with miner and my issue comes from unbalanced data. My dataset is made of 109,194 records, from which 1379 have a target=1 and the remaining 107815 have a target=0, leading to a 98.74%/1.26% ratio. My 30 predictors are all numeric.

I have tested three way to handle this unbalanced data: first one, I do no sample at all as per the following diagram

method1 (raw)method1 (raw)

Second one I over sample the minority class 1 to represent about 30% of the dataset using the Sampling node (criterion property set a level-based) 

method2 (over sampling)method2 (over sampling)

Last one, I do not over sample but change the values in the diagonal in the Decision weight tabs form the Input Node option and put as a weight for the rare event the ratio of probability of common event / rare event, namely 98.74/1.26=78.36.

method3 (Decision Weights)method3 (Decision Weights)

The results are as follow 

Method1 resultsMethod1 resultsMethod2 resultsMethod2 resultsMethod3 resultsMethod3 results

I do not find the results tremendously convincing (and still confused as why false/true positive are non integer for method2). Am I doing anything wrong? I know there i a lot bout unbalanced data but I do not seem to find a way to apply any solution to my case. Thanks

Nicolas

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

Hi Nicolas,

Maybe this thread can help you while someone takes a second look into what you did?

 

https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/Oversampling-in-Enterprise-Miner-with-a-r...

 

When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself 🙂
Would that be an option for you?

Best,
-Miguel

View solution in original post

1 REPLY 1
M_Maldonado
Barite | Level 11

Hi Nicolas,

Maybe this thread can help you while someone takes a second look into what you did?

 

https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/Oversampling-in-Enterprise-Miner-with-a-r...

 

When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself 🙂
Would that be an option for you?

Best,
-Miguel

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 5520 views
  • 0 likes
  • 2 in conversation