BookmarkSubscribeRSS Feed

Investigating the Effect of Severe Class Imbalances in ML Classification Scenarios

Started ‎06-20-2022 by
Modified ‎06-20-2022 by
Views 1,259

We are often faced with situations when there is a severe imbalance in the frequencies of categories in classification problems. If one class is severely underrepresented in a two-class problem, we worry that our results will be biased towards the majority class since it appears to have the highest accuracy of all the classes under consideration. For example, in addressing a fraud problem, there may be a very low incidence of fraudulent transactions, say << 1%. In such a situation, using a naive stratified sample as input to a classification algorithm may simply result in the algorithm choosing 'non-fraudulent' the majority of the time, which will create false negative results because of the overwhelming incidence of legitimate transactions.
One remedy to this case is simply to oversample the known fraudulent transactions in the expectation that a classification algorithm will perform more accurately in defining the boundary between 'fraud' and 'nonfraud' transactions. We explore this paradigm in our paper, "Hybrid Rare Event Sampling Technique", for which we wrote the %HYRES macro to generate samples containing specified frequencies or percentages of events and nonevents. In this paper, we selected a number of datasets with different characteristics and frequencies of events and nonevents.

 

The %HYRES macro is included as an attachment to this post for the SAS community to use as a tool for creating specific datasets with class imbalances.

Version history
Last update:
‎06-20-2022 09:58 AM
Updated by:
Contributors

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags