BookmarkSubscribeRSS Feed
Honest_Abe
Calcite | Level 5

I'm following the Getting Started with SAS Enterprise Miner example: https://support.sas.com/documentation/onlinedoc/miner/.  If I do not adjust the prior probabilities to 0.05/0.95 as suggested, but use 0.25/0.75 instead, the regression and tree models produce models with ROC curves that appear to be y=x.  In other words, they are like flipping a coin.  It seems that the models place every observation in the class with the larger posterior probability.  

 

 

It would seem to me that adjusting the prior probabilities to 0.05/0.95 would make things worse.  The help states, "Increasing the prior probability of a class increases the posterior probability, moving the classification boundary so that more cases are classified into the class."  However, when you do that, the decision tree has splits and both the ROC curves are concave(ish).  

 

Why do the models produced with 0.05/0.95 priors produce "better" results than the models produced with 0.25/0.75 priors?

5 REPLIES 5
JasonXin
SAS Employee
Hello, When you set it at 0.25/0.75 (assuming 0.25 for target=1 and 0.75 for target=0), you are telling the software(here is EM. The same if you set the weight statement variable value as such) that the effective, logic event rate on the incoming target value is 25%. In marketing term, you already have historical response rate at 25%. Many, if not all marketing managers, would ask why we need a response model, because normally when the past response rate is <=5% people would think building a response model would make sense to boost it. When you set it to 0.05 vs 0.95, you are telling EM the incoming historical event/response rate is 5%. Therefore, with 25 vs 75, your model is OK, just there is little room for improve so the ROC appears just like the 45 degree random toss line. When you have 5 vs 95, the curve appears 'normal'. This, of course, is the case if you hold other things unchanged. Hope this helps? Thanks for using SAS. Jason Xin
dtk
SAS Employee dtk
SAS Employee

Hi Honest_Abe,

 

On the page where the instructions call out the edits to the "Prior Probabilities" tab, there's other instructions around making changes on the "Decision Weights" tab as well.  If you made those changes, then the model assessment criterion is based on those decision weights.   If you use the numbers in the book, that means that each row in your data set represents a 25% chance of making $14.50, and a 75% chance of losing $0.50 (or making -$0.50, equivalently).    Think about betting $0.50 to win a $15.00 jackpot.

 

If your probability of winning that bet (in the population, no models, no predictions) was 25%, then you should probably take that bet all day, every day!  Your expected "winnings" are (.25 x $14.5) + (.75 x -$0.50) = $3.25.   Don't build a model, just take that bet 🙂   [This is what I suspect happened in your case.  Enterprise Miner built a series of trees and a series of regressions, but none of them could beat this average profit figure, so it "Occam's Razor"-ed you and took the simplest model that gave the best results.  It's hard to get a simpler model than "mail everybody, all of the time," so that's what the tree and the regression gave you.

 

But!  What if you lived in a world where the baseline "success" rate (the probability that TARGET_B=1) is closer to 5% than 25%? Then for each trial, your expected profit is (0.05 x $14.5)+(0.95 x -$0.50) = $0.25.  Now we're talking about betting $0.50 to try and win $0.25.  (In many real world cases, the expectation is negative, and you're worse off than that.)  So how do we gain an advantage?  Build a model, and target the sub-population that has a favorable expected value:  everyone gets a predicted probability (p), and you can choose to mail only if (p x $14.5)+[(1-p) x -$0.50] comes out to be "large enough."    "Large enough" could mean "positive," or it could be subject to some other constraints/considerations.

 

If you don't know good numbers for those profit/loss values, or this expected-value argument doesn't meet your needs, then you can actually tell the model nodes to select the best model according to validation data average squared error and you'll probably get results more in line with your expectations.

 

If you didn't put any values in the "Decision Weight" tab, then I need to re-visit your question.  If you have any other details about steps you were experimenting with, that might be a good clue.

 

Thanks!

Honest_Abe
Calcite | Level 5

I followed all of the instructions except for adjusting the prior probabilities.  So I did put the weights of $14.50 and -$0.50 in.   dtk's answer would make sense if all observations were placed in the positive group, but all observations are placed in the negative group when the prior probabilities are 0.25/0.75.

 

With both 0.25/0/75 and 0.05/0.95 priors, all observations are placed in the negative group according to the validation classification matrix.  But the ROC curve is y=x for 0.25/0.75 but is more concave for 0.05/0.95.

 

My initial post should have read, "It seems that the models place every observation in the class with the larger *prior* probability."  

dtk
SAS Employee dtk
SAS Employee

Hi Honest_Abe,

 

Are you getting any indications in the model results that it's using the profit information?  Do you see anything like "Average Profit" or "Total Profit" in the assessment statistics?   Does the model that you're using have a property where you can direct it to choose the best model iteration based on "Validation Profit" or something comparable?

 

It sounds like the decisions coming from the modeling node are based on the "Misclassification," which would just assign an observation to the category with the highest predicted probability.

 

 

Honest_Abe
Calcite | Level 5

Yes, for the Decision Tree node, Assessment Measure is set to its default "Decision."  Other settings are as the book indicates.  The same is true for Regression.  Profit is mentioned in the output at each of the model nodes and the Model Comparison node:

 

Fit Statistics
Model Selection based on Valid: Average Profit for TARGET_B (_VAPROF_)

Valid: Train: Valid:
Average Average Train: Average Valid:
Selected Model Model Profit for Squared Misclassification Squared Misclassification
Model Node Description TARGET_B Error Rate Error Rate

Y Tree Decision Tree 3.24914 0.18752 0.25005 0.18747 0.24994
Reg Regression 3.24914 0.18752 0.25005 0.18747 0.24994

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2261 views
  • 0 likes
  • 3 in conversation