09-05-2016 02:10 PM
I'm following the Getting Started with SAS Enterprise Miner example: https://support.sas.com/documentation/onlinedoc/miner/. If I do not adjust the prior probabilities to 0.05/0.95 as suggested, but use 0.25/0.75 instead, the regression and tree models produce models with ROC curves that appear to be y=x. In other words, they are like flipping a coin. It seems that the models place every observation in the class with the larger posterior probability.
It would seem to me that adjusting the prior probabilities to 0.05/0.95 would make things worse. The help states, "Increasing the prior probability of a class increases the posterior probability, moving the classification boundary so that more cases are classified into the class." However, when you do that, the decision tree has splits and both the ROC curves are concave(ish).
Why do the models produced with 0.05/0.95 priors produce "better" results than the models produced with 0.25/0.75 priors?
09-13-2016 11:20 AM
09-13-2016 12:41 PM
On the page where the instructions call out the edits to the "Prior Probabilities" tab, there's other instructions around making changes on the "Decision Weights" tab as well. If you made those changes, then the model assessment criterion is based on those decision weights. If you use the numbers in the book, that means that each row in your data set represents a 25% chance of making $14.50, and a 75% chance of losing $0.50 (or making -$0.50, equivalently). Think about betting $0.50 to win a $15.00 jackpot.
If your probability of winning that bet (in the population, no models, no predictions) was 25%, then you should probably take that bet all day, every day! Your expected "winnings" are (.25 x $14.5) + (.75 x -$0.50) = $3.25. Don't build a model, just take that bet [This is what I suspect happened in your case. Enterprise Miner built a series of trees and a series of regressions, but none of them could beat this average profit figure, so it "Occam's Razor"-ed you and took the simplest model that gave the best results. It's hard to get a simpler model than "mail everybody, all of the time," so that's what the tree and the regression gave you.
But! What if you lived in a world where the baseline "success" rate (the probability that TARGET_B=1) is closer to 5% than 25%? Then for each trial, your expected profit is (0.05 x $14.5)+(0.95 x -$0.50) = $0.25. Now we're talking about betting $0.50 to try and win $0.25. (In many real world cases, the expectation is negative, and you're worse off than that.) So how do we gain an advantage? Build a model, and target the sub-population that has a favorable expected value: everyone gets a predicted probability (p), and you can choose to mail only if (p x $14.5)+[(1-p) x -$0.50] comes out to be "large enough." "Large enough" could mean "positive," or it could be subject to some other constraints/considerations.
If you don't know good numbers for those profit/loss values, or this expected-value argument doesn't meet your needs, then you can actually tell the model nodes to select the best model according to validation data average squared error and you'll probably get results more in line with your expectations.
If you didn't put any values in the "Decision Weight" tab, then I need to re-visit your question. If you have any other details about steps you were experimenting with, that might be a good clue.
10-16-2016 06:46 PM
I followed all of the instructions except for adjusting the prior probabilities. So I did put the weights of $14.50 and -$0.50 in. dtk's answer would make sense if all observations were placed in the positive group, but all observations are placed in the negative group when the prior probabilities are 0.25/0.75.
With both 0.25/0/75 and 0.05/0.95 priors, all observations are placed in the negative group according to the validation classification matrix. But the ROC curve is y=x for 0.25/0.75 but is more concave for 0.05/0.95.
My initial post should have read, "It seems that the models place every observation in the class with the larger *prior* probability."
10-17-2016 08:42 AM
Are you getting any indications in the model results that it's using the profit information? Do you see anything like "Average Profit" or "Total Profit" in the assessment statistics? Does the model that you're using have a property where you can direct it to choose the best model iteration based on "Validation Profit" or something comparable?
It sounds like the decisions coming from the modeling node are based on the "Misclassification," which would just assign an observation to the category with the highest predicted probability.
10-17-2016 10:14 PM
Yes, for the Decision Tree node, Assessment Measure is set to its default "Decision." Other settings are as the book indicates. The same is true for Regression. Profit is mentioned in the output at each of the model nodes and the Model Comparison node:
Model Selection based on Valid: Average Profit for TARGET_B (_VAPROF_)
Valid: Train: Valid:
Average Average Train: Average Valid:
Selected Model Model Profit for Squared Misclassification Squared Misclassification
Model Node Description TARGET_B Error Rate Error Rate
Y Tree Decision Tree 3.24914 0.18752 0.25005 0.18747 0.24994
Reg Regression 3.24914 0.18752 0.25005 0.18747 0.24994