BookmarkSubscribeRSS Feed
a_bloch
Calcite | Level 5


Hello,
I am working on an imbalanced dataset with 15% of the cases belonging to the class of interest. I have used a stratified sample on training and validation set with equal sizes to overcome the strong bias of the model towards the majority class.

 

I am now having troubles scoring my model as I would like to use a test set with the original proportions (15%-85%). I have tried to edit the target profile assigning those prior proportions from the "Score using test dataset" node in the attached figure, but when I run the the "Score[apply]" node it will still use a score set with equal probabilities (I can easily see this from the Insight node).

Does anybody know how to overcome this problem? I am using SAS Enterprise Miner in SAS 9.3.

 

All help greatly appreciated.
 
Capture.JPG
2 REPLIES 2
Reeza
Super User

I'm just a touch confused and possibly out of my depths here, but from what I understand of scoring - the prior probabilities do not come into play even in a tree diagram.  The rules are applied the same regardless of the proportions in the sample. 

 

Did you manually create your partitioned data or use a prior probability setting to set up the 50/50% data sets?

a_bloch
Calcite | Level 5

Thank you Reeza.
I created a 50-50 sample from th Input Node by going on Stratification-->Options--> Equal Size.
This appears to work fine when a Tree is run, however problems arise when I try the scoring as I would like to use a test set with the original proportions.
Finally I have created 2 files in SQL, one for training and validation (with 50-50 proportions) and one for test with the original proportions. I simply create a different partition as I take my test set from another file. 

I also noticed that - when I created the 50-50 sample from the original file - the decision tree worked fine and gave me a nice confusion matrix but neural nets and regression ignored the equal size option resulting in a confusion matrix with 14% of the observations belonging to the class of interest. I now don't have this problem anymore as I solved it manually but do you know if the Stratification-->Options--> Equal Size normally presents such issues?

Many thanks,

a_bloch

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2412 views
  • 0 likes
  • 2 in conversation