BookmarkSubscribeRSS Feed

Investigating the Effect of Severe Class Imbalances in ML Classification Scenarios

Started ‎06-20-2022 by
Modified ‎06-20-2022 by
Views 1,796

We are often faced with situations when there is a severe imbalance in the frequencies of categories in classification problems. If one class is severely underrepresented in a two-class problem, we worry that our results will be biased towards the majority class since it appears to have the highest accuracy of all the classes under consideration. For example, in addressing a fraud problem, there may be a very low incidence of fraudulent transactions, say << 1%. In such a situation, using a naive stratified sample as input to a classification algorithm may simply result in the algorithm choosing 'non-fraudulent' the majority of the time, which will create false negative results because of the overwhelming incidence of legitimate transactions.
One remedy to this case is simply to oversample the known fraudulent transactions in the expectation that a classification algorithm will perform more accurately in defining the boundary between 'fraud' and 'nonfraud' transactions. We explore this paradigm in our paper, "Hybrid Rare Event Sampling Technique", for which we wrote the %HYRES macro to generate samples containing specified frequencies or percentages of events and nonevents. In this paper, we selected a number of datasets with different characteristics and frequencies of events and nonevents.

 

The %HYRES macro is included as an attachment to this post for the SAS community to use as a tool for creating specific datasets with class imbalances.

Contributors
Version history
Last update:
‎06-20-2022 09:58 AM
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags