BookmarkSubscribeRSS Feed
a_bloch
Calcite | Level 5


Hello,
I am working on an imbalanced dataset with 15% of the cases belonging to the class of interest. I have used a stratified sample on training and validation set with equal sizes to overcome the strong bias of the model towards the majority class.

 

I am now having troubles scoring my model as I would like to use a test set with the original proportions (15%-85%). I have tried to edit the target profile assigning those prior proportions from the "Score using test dataset" node in the attached figure, but when I run the the "Score[apply]" node it will still use a score set with equal probabilities (I can easily see this from the Insight node).

Does anybody know how to overcome this problem? I am using SAS Enterprise Miner in SAS 9.3.

 

All help greatly appreciated.
 
Capture.JPG
2 REPLIES 2
Reeza
Super User

I'm just a touch confused and possibly out of my depths here, but from what I understand of scoring - the prior probabilities do not come into play even in a tree diagram.  The rules are applied the same regardless of the proportions in the sample. 

 

Did you manually create your partitioned data or use a prior probability setting to set up the 50/50% data sets?

a_bloch
Calcite | Level 5

Thank you Reeza.
I created a 50-50 sample from th Input Node by going on Stratification-->Options--> Equal Size.
This appears to work fine when a Tree is run, however problems arise when I try the scoring as I would like to use a test set with the original proportions.
Finally I have created 2 files in SQL, one for training and validation (with 50-50 proportions) and one for test with the original proportions. I simply create a different partition as I take my test set from another file. 

I also noticed that - when I created the 50-50 sample from the original file - the decision tree worked fine and gave me a nice confusion matrix but neural nets and regression ignored the equal size option resulting in a confusion matrix with 14% of the observations belonging to the class of interest. I now don't have this problem anymore as I solved it manually but do you know if the Stratification-->Options--> Equal Size normally presents such issues?

Many thanks,

a_bloch

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1638 views
  • 0 likes
  • 2 in conversation