I am trying to do logistic regression, decision tree, KNN & neural network on a dataset where I have 9800 rows, the target is binary and 98% 0. I have 1000 interval predictors, all the variables have many 0s and not normal distributions. How should I approach to handle the imbalanced data in SAS Miner for each of the models? Can somebody pls help?
Could you please take a look at the question of mine that I posted in my page @Ksharp ? Thank you very much.
Due to every small event probability , any model would not be trusted.
Oversample stands for enhancing event prob, if you have 1000 obs only 10 obs is 1,you need randomly sample 30 or 40 from the remain 990 obs which is 0 to form a train data to model . a.k.a 1:0 is about 1:3 or 1:4 .
if you are using PROC LOGISTIC ,don't forget to use PEVENT=0.01 to adjust predicted prob .
And @Rick_SAS maybe have good ideas.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.