BookmarkSubscribeRSS Feed
akmsharif
Fluorite | Level 6

I am trying to do logistic regression, decision tree, KNN & neural network on a dataset where I have 9800 rows, the target is binary and 98% 0. I have 1000 interval predictors, all the variables have many 0s and not normal distributions. How should I approach to handle the imbalanced data in SAS Miner for each of the models? Can somebody pls help?

6 REPLIES 6
Ksharp
Super User
Oversample , make good:bad about 3:1 or 4:1 .
akmsharif
Fluorite | Level 6
Can you please let me know a bit detail?
janex
Calcite | Level 5

Could you please take a look at the question of mine that I posted in my page @Ksharp ? Thank you very much.

Ksharp
Super User

Due to every small event probability , any model would not be trusted.

Oversample stands for enhancing event prob, if you have 1000 obs only 10 obs is 1,you need randomly sample 30 or 40 from the remain 990 obs which is 0 to form a train data to model . a.k.a 1:0 is about 1:3 or 1:4 .

if you are using PROC LOGISTIC ,don't forget to use PEVENT=0.01 to adjust predicted prob .

 

And @Rick_SAS  maybe have good ideas.

 

akmsharif
Fluorite | Level 6
@Ksharp, @Rick_SAS:
I am using SAS MIner.. How should I apply the PEVENT function there?
Thanks.
Ksharp
Super User
If I was right, left click the 'logistic' node ,and at left panel you could find an option PEVENT

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1949 views
  • 4 likes
  • 3 in conversation