Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

SAS EM Prior Probabilities

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

SAS EM Prior Probabilities

Hi,

 

I'm trying to understand how EM calculates the no. of events vs non-events in each ranked demi-decile after adjusting for prior probabilities.

 

In my original data, I have 1% events and 99% non-events.

In my sample data for model development, I have 20% events and 80% non-events.

 

I apply a random forest to my sample data.  The model predicts that I have in my 1st bin (i.e. demi-decile with the highest scores), 343 true events and 23 true non-events.

 

After applying the decision node to my model results, I now have in my 1st bin (i.e. the demi-decile with the highest ADJUSTED scores), 36 true events and 332 true non-events.  How was this actually determined?  I understand how the posterior probabilities are adjusted but I don't understand how the no. of true events and non-events are adjusted.

 

Appreciate if someone can help to explain this.


Accepted Solutions
Solution
2 weeks ago
SAS Employee
Posts: 121

Re: SAS EM Prior Probabilities

There are two different issues involved here -- the first is obtaining probabilities centered near your population estimates and the other is determining how to classify each observation based on that probability (adjusted for priors or not) and a decision weight if you have incorporated one.   By default, SAS Enterprise Miner generates a misclassification chart for the Train & Validate data sets based on two variables which have the form 

 

     F_<target variable name> : the actual target level  

     I _<target variable name> : the predicted target level

 

SAS Enterprise Miner will compute a predicted probability (adjusted for priors if requested) for each level of the target of the form

 

     P_<target variable name><target variable level>     

 

So for a target variable named 'BAD' with levels 0 or 1, it will generate 

 

     P_BAD1 :  the predicted probability that BAD=1

     P_BAD0 :  the predicted probability that BAD=0

 

Using my example, the variable F_BAD is simply the actual target level (0 or 1) and the variable I_BAD will take the level associated with the highest predicted probability P_BAD1 and P_BAD0.   It is reasonable to assign observations to the target level which is most likely but this presents problems in rare event scenarios.  

In your oversampled data, your target level of interest occurred 20% of the time overall.  Using my example, suppose that BAD=1 occurs 20% of the time in the sample.   To have P_BAD1 > P_BAD0, the observation had to have P_BAD1 > 50% which represents someone at least(50%) / (20%) = 2.5 times as likely to have the event compared to the overall average.   After adjusting for the prior probabilities to have the overall average only 1%, you would now need someone who was at least (50%) / (1%) = 50 times as likely to have the rare event as the predicted event.    Since there are far fewer people in this category, there are far fewer people (possibly none!) classified as having the rare event according to I_BAD (using my example).   This is why the number of predicted events changes so dramatically in your example.  

 

In these situations, you can consider using a target weight to put more weight on the rare event.   If you do add Decision weights (either in the Decisions node or in the Input Data Source node), SAS Enterprise Miner will also generate a D _<target variable name> which contains the 'decision' outcome based on the 'most profitable' or 'least costly' outcome.  In this situation, the decision weight is multiplied by the adjusted probability to get the 'expected value' of the decision and the outcome is assigned based on the best outcome.    

 

Assigning outcomes based on putting extra decision weight on rare events can also pose challenges since those outcomes will be predicted to occur more often than they actually do.   If you click on the button 'Default with Inverse Prior Weights', SAS Enterprise Miner will take the specified prior and divide it into 1 to obtain the weight.  Suppose the prior probabilities were specified as 20% and 80%.   Then using the 'Default with Inverse Prior Weights' button would yield weights of  1 / 0.2 = 5 for the rare event and 1 / 0.8 = 1.25 for the common event.  You will notice that the ratio of weights

 

   5 / 1.25 = 4

 

is in the same ratio as the prior probabilities

 

    80% / 20% = 4

 

so simply leaving the weight on the common event as 1 and changing the rare event to have a weight of 4 will have the same impact. Notice now that for the 'average' observation who has a probability of the rare event as 20% (or 0.2) and probability of the common event of 80% (or 0.8), you can see the expected value is the same using the weights as described above:

 

     Level            Prior      Weight       Expected Value

 rare event          0.2           4                0.2 * 4 = 0.8

common event    0.8           1                0.8 * 1 = 0.8

 

which suggests that using the 'Default with Inverse Prior Weights' will assign anyone with a probability higher than 0.2 (in this scenario) to have the target event which corresponds to anyone with a higher predicted probability than average.   This will generate a lot more predicted events based on the D_<variable name> variable since it is not unlikely that half or more of the observations have a predicted probability higher than average.  

 

So what do you do?  Understand that the overall misclassification rate of the data set is not what is critical.   Look at the rate in each percentile of the data and determine how deep you want to go.  Then you can choose your own Decision threshold (e.g. probability higher than 0.35) above which you get a satisfactory misclassification rate.  The approach taken by SAS Enterprise Miner is a reasonable one since it has no business knowledge to base the outcome on other than what is provided -- either pick the most likely outcome or the most valuable outcome based on your weights -- but your best decisions will always incorporate your analytical needs.

 

For example, in some cases you might need an extremely low misclassification rate (e.g. maybe only looking at the top 1% or 2% of the scored data) because you are searching for fraud and don't want to annoy customers that are not acting fraudulently.  In other cases, you might be looking for a minimum response rate to make money (e.g. some direct mail advertisers only need a 2% response rate to be profitable).  Your best 'decision' should always incorporate your analytical and/or business objectives.  

 

I hope this helps!

Doug

View solution in original post


All Replies
Solution
2 weeks ago
SAS Employee
Posts: 121

Re: SAS EM Prior Probabilities

There are two different issues involved here -- the first is obtaining probabilities centered near your population estimates and the other is determining how to classify each observation based on that probability (adjusted for priors or not) and a decision weight if you have incorporated one.   By default, SAS Enterprise Miner generates a misclassification chart for the Train & Validate data sets based on two variables which have the form 

 

     F_<target variable name> : the actual target level  

     I _<target variable name> : the predicted target level

 

SAS Enterprise Miner will compute a predicted probability (adjusted for priors if requested) for each level of the target of the form

 

     P_<target variable name><target variable level>     

 

So for a target variable named 'BAD' with levels 0 or 1, it will generate 

 

     P_BAD1 :  the predicted probability that BAD=1

     P_BAD0 :  the predicted probability that BAD=0

 

Using my example, the variable F_BAD is simply the actual target level (0 or 1) and the variable I_BAD will take the level associated with the highest predicted probability P_BAD1 and P_BAD0.   It is reasonable to assign observations to the target level which is most likely but this presents problems in rare event scenarios.  

In your oversampled data, your target level of interest occurred 20% of the time overall.  Using my example, suppose that BAD=1 occurs 20% of the time in the sample.   To have P_BAD1 > P_BAD0, the observation had to have P_BAD1 > 50% which represents someone at least(50%) / (20%) = 2.5 times as likely to have the event compared to the overall average.   After adjusting for the prior probabilities to have the overall average only 1%, you would now need someone who was at least (50%) / (1%) = 50 times as likely to have the rare event as the predicted event.    Since there are far fewer people in this category, there are far fewer people (possibly none!) classified as having the rare event according to I_BAD (using my example).   This is why the number of predicted events changes so dramatically in your example.  

 

In these situations, you can consider using a target weight to put more weight on the rare event.   If you do add Decision weights (either in the Decisions node or in the Input Data Source node), SAS Enterprise Miner will also generate a D _<target variable name> which contains the 'decision' outcome based on the 'most profitable' or 'least costly' outcome.  In this situation, the decision weight is multiplied by the adjusted probability to get the 'expected value' of the decision and the outcome is assigned based on the best outcome.    

 

Assigning outcomes based on putting extra decision weight on rare events can also pose challenges since those outcomes will be predicted to occur more often than they actually do.   If you click on the button 'Default with Inverse Prior Weights', SAS Enterprise Miner will take the specified prior and divide it into 1 to obtain the weight.  Suppose the prior probabilities were specified as 20% and 80%.   Then using the 'Default with Inverse Prior Weights' button would yield weights of  1 / 0.2 = 5 for the rare event and 1 / 0.8 = 1.25 for the common event.  You will notice that the ratio of weights

 

   5 / 1.25 = 4

 

is in the same ratio as the prior probabilities

 

    80% / 20% = 4

 

so simply leaving the weight on the common event as 1 and changing the rare event to have a weight of 4 will have the same impact. Notice now that for the 'average' observation who has a probability of the rare event as 20% (or 0.2) and probability of the common event of 80% (or 0.8), you can see the expected value is the same using the weights as described above:

 

     Level            Prior      Weight       Expected Value

 rare event          0.2           4                0.2 * 4 = 0.8

common event    0.8           1                0.8 * 1 = 0.8

 

which suggests that using the 'Default with Inverse Prior Weights' will assign anyone with a probability higher than 0.2 (in this scenario) to have the target event which corresponds to anyone with a higher predicted probability than average.   This will generate a lot more predicted events based on the D_<variable name> variable since it is not unlikely that half or more of the observations have a predicted probability higher than average.  

 

So what do you do?  Understand that the overall misclassification rate of the data set is not what is critical.   Look at the rate in each percentile of the data and determine how deep you want to go.  Then you can choose your own Decision threshold (e.g. probability higher than 0.35) above which you get a satisfactory misclassification rate.  The approach taken by SAS Enterprise Miner is a reasonable one since it has no business knowledge to base the outcome on other than what is provided -- either pick the most likely outcome or the most valuable outcome based on your weights -- but your best decisions will always incorporate your analytical needs.

 

For example, in some cases you might need an extremely low misclassification rate (e.g. maybe only looking at the top 1% or 2% of the scored data) because you are searching for fraud and don't want to annoy customers that are not acting fraudulently.  In other cases, you might be looking for a minimum response rate to make money (e.g. some direct mail advertisers only need a 2% response rate to be profitable).  Your best 'decision' should always incorporate your analytical and/or business objectives.  

 

I hope this helps!

Doug

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 211 views
  • 0 likes
  • 2 in conversation