BookmarkSubscribeRSS Feed

Investigating the Effect of Severe Class Imbalances in ML Classification Scenarios

Started ‎06-20-2022 by
Modified ‎06-20-2022 by
Views 545

We are often faced with situations when there is a severe imbalance in the frequencies of categories in classification problems. If one class is severely underrepresented in a two-class problem, we worry that our results will be biased towards the majority class since it appears to have the highest accuracy of all the classes under consideration. For example, in addressing a fraud problem, there may be a very low incidence of fraudulent transactions, say << 1%. In such a situation, using a naive stratified sample as input to a classification algorithm may simply result in the algorithm choosing 'non-fraudulent' the majority of the time, which will create false negative results because of the overwhelming incidence of legitimate transactions.
One remedy to this case is simply to oversample the known fraudulent transactions in the expectation that a classification algorithm will perform more accurately in defining the boundary between 'fraud' and 'nonfraud' transactions. We explore this paradigm in our paper, "Hybrid Rare Event Sampling Technique", for which we wrote the %HYRES macro to generate samples containing specified frequencies or percentages of events and nonevents. In this paper, we selected a number of datasets with different characteristics and frequencies of events and nonevents.

 

The %HYRES macro is included as an attachment to this post for the SAS community to use as a tool for creating specific datasets with class imbalances.

Version history
Last update:
‎06-20-2022 09:58 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags