BookmarkSubscribeRSS Feed
Solly7
Pyrite | Level 9

Hi, 

 

Im working on binary classification model using logistic regression in SAS Base, but my data is extremely imbalanced...i need help in balancing the data or perhaps strategies in working with this kind of imbalance data using SAS BASE..see screenshot below for my data

Solly7_0-1623065192016.png

 

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Let's say you want to have twice as many 0s as 1s (so 1/3 of the data is now 1). You can randomly select records with 0 to be removed so that you have 4572 0s and 2286 1s. Or if you want 1/2 0s and 1/2 1s, you can modify the selection process to produce 2286 0s and 2286 1s.

 

The method is called "oversampling", and here is a way to handle oversampled data in your logistic regression in SAS. https://support.sas.com/kb/22/601.html

--
Paige Miller
Solly7
Pyrite | Level 9
Hi thanks for your propmpt response, so lets say i have sample data with 20000 samples and lets call it full_data...so do I need to split the the full_data into training and testing..then oversample the training data? or am i not understanding...
PaigeMiller
Diamond | Level 26

I would oversample first (reduce the imbalance), and then split that data randomly into training and validation.

--
Paige Miller
Ksharp
Super User

Here are three ways you could go :
1) oversample to 1:1   or  1:2   or   1:3    or  1:4

or
2) using exactly logistic regression, but due to your sample size is big, that could be mission impossible.

or
3)using penalty logistic regression by FIRTH option:
proc logistic.......
model ............ / firth ;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 844 views
  • 0 likes
  • 3 in conversation