Hi there
I am trying to build a classifier with miner and my issue comes from unbalanced data. My dataset is made of 109,194 records, from which 1379 have a target=1 and the remaining 107815 have a target=0, leading to a 98.74%/1.26% ratio. My 30 predictors are all numeric.
I have tested three way to handle this unbalanced data: first one, I do no sample at all as per the following diagram
Second one I over sample the minority class 1 to represent about 30% of the dataset using the Sampling node (criterion property set a level-based)
Last one, I do not over sample but change the values in the diagonal in the Decision weight tabs form the Input Node option and put as a weight for the rare event the ratio of probability of common event / rare event, namely 98.74/1.26=78.36.
The results are as follow
I do not find the results tremendously convincing (and still confused as why false/true positive are non integer for method2). Am I doing anything wrong? I know there i a lot bout unbalanced data but I do not seem to find a way to apply any solution to my case. Thanks
Nicolas
Hi Nicolas,
Maybe this thread can help you while someone takes a second look into what you did?
When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself 🙂
Would that be an option for you?
Best,
-Miguel
Hi Nicolas,
Maybe this thread can help you while someone takes a second look into what you did?
When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself 🙂
Would that be an option for you?
Best,
-Miguel
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.