Hello,
I am working on an imbalanced dataset with 15% of the cases belonging to the class of interest. I have used a stratified sample on training and validation set with equal sizes to overcome the strong bias of the model towards the majority class.
I am now having troubles scoring my model as I would like to use a test set with the original proportions (15%-85%). I have tried to edit the target profile assigning those prior proportions from the "Score using test dataset" node in the attached figure, but when I run the the "Score[apply]" node it will still use a score set with equal probabilities (I can easily see this from the Insight node).
Does anybody know how to overcome this problem? I am using SAS Enterprise Miner in SAS 9.3.
All help greatly appreciated.
I'm just a touch confused and possibly out of my depths here, but from what I understand of scoring - the prior probabilities do not come into play even in a tree diagram. The rules are applied the same regardless of the proportions in the sample.
Did you manually create your partitioned data or use a prior probability setting to set up the 50/50% data sets?
Thank you Reeza.
I created a 50-50 sample from th Input Node by going on Stratification-->Options--> Equal Size.
This appears to work fine when a Tree is run, however problems arise when I try the scoring as I would like to use a test set with the original proportions.
Finally I have created 2 files in SQL, one for training and validation (with 50-50 proportions) and one for test with the original proportions. I simply create a different partition as I take my test set from another file.
I also noticed that - when I created the 50-50 sample from the original file - the decision tree worked fine and gave me a nice confusion matrix but neural nets and regression ignored the equal size option resulting in a confusion matrix with 14% of the observations belonging to the class of interest. I now don't have this problem anymore as I solved it manually but do you know if the Stratification-->Options--> Equal Size normally presents such issues?
Many thanks,
a_bloch
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.