Hi,
Im working on binary classification model using logistic regression in SAS Base, but my data is extremely imbalanced...i need help in balancing the data or perhaps strategies in working with this kind of imbalance data using SAS BASE..see screenshot below for my data
Let's say you want to have twice as many 0s as 1s (so 1/3 of the data is now 1). You can randomly select records with 0 to be removed so that you have 4572 0s and 2286 1s. Or if you want 1/2 0s and 1/2 1s, you can modify the selection process to produce 2286 0s and 2286 1s.
The method is called "oversampling", and here is a way to handle oversampled data in your logistic regression in SAS. https://support.sas.com/kb/22/601.html
I would oversample first (reduce the imbalance), and then split that data randomly into training and validation.
Here are three ways you could go :
1) oversample to 1:1 or 1:2 or 1:3 or 1:4
or
2) using exactly logistic regression, but due to your sample size is big, that could be mission impossible.
or
3)using penalty logistic regression by FIRTH option:
proc logistic.......
model ............ / firth ;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.