BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jeremyyuan
Calcite | Level 5

I have a data with a rare event (2% bads and 98% goods), so I oversampled the data with 30% bads and 70% goods and let EM read in this oversampled data. After the data was read in, I set the Adjusted Prior to 0.02 vs 0.98 in Prior Probabilities tab.  Then I set 60% train and 40% as validation in partition. After that, I entered my 10 variables in Decision Tress model and in the property panel under Subtree Section, there is a Assessment Measure, in which I chose Decision (which is a default). Then the results came out with several nodes and it looked as I expected. Then I changed the Assessment Measure to Misclassification and ran again, this time no nodes came out (no error message), just the root nodes!

Then I changed my strategy. I let EM read the data without setting the Adjusted Prior. So EM read the data  as 30% bads and 70% goods and the partition was the same. I ran this data by Decision Trees. When I set Assessment Measure as Decision (which is default), there was no sub-node(just root node as before). But when I changed to Misclassification, the results came out with several nodes, which was similar to that above with Decision choice in Assessment Measure.

 

You see the different choices in Assessment Measure led to the opposite results. My data has no missing values and my target is bainary(1/0) and my data has no profit/loss information. My puzzles here are: 1) Do I need to set the Adjusted Prior to 0.02 vs 0.98?  2) if yes to the first question, then is Misclassification in Assessment Measure a correct choice? 3) If yes to the second question, why no results cam out, while using Decision, expected results came out?

 

Very much appreciated any tips for my puzzle! Thanks a lot!!

 

Jeremy Yuan

 

Using EM 13.2 on web-base client

1 ACCEPTED SOLUTION

Accepted Solutions
PadraicGNeville
SAS Employee

 Hi, Jeremy.

 

I suspect that when the tree consists of only the root node, several nodes are created but they are being pruned away.  When the Tree thinks the proportions are 2% bad and 98% good, all such nodes might be classifying the observations as good. The misclassification method of assessment will prune all the nodes because no node will change the misclassification rate.  When the Tree thinks the proportions are 30% bad and 70% good, it is more likely that some nodes classify observations as bad, and the misclassification rate can improve if these nodes are included in the Tree.  With Adjust Priors, the Tree thinks there are 2% bad.

 

I would not specify Adjusted Priors in this case.

 

I do not know what the DECISION method of assessment is doing that is different from misclassification.

-Padraic

View solution in original post

5 REPLIES 5
DougWielenga
SAS Employee

Jeremy,

 

There are two different issues involved here -- the first is obtaining probabilities centered near your population estimates and the other is determining how to classify each observation based on that probability (adjusted for priors or not) and a decision weight if you have incorporated one.   By default, SAS Enterprise Miner generates a misclassification chart for the Train & Validate data sets based on two variables which have the form 

 

 

     F_<target variable name> : the actual target level  

     I _<target variable name> : the predicted target level

 

SAS Enterprise Miner will compute a predicted probability (adjusted for priors if requested) for each level of the target of the form

 

     P_<target variable name><target variable level>     

 

So for a target variable named 'BAD' with levels 0 or 1, it will generate 

 

     P_BAD1 :  the predicted probability that BAD=1

     P_BAD0 :  the predicted probability that BAD=0

 

The variable F_BAD is simply the actual target level (0 or 1) and the variable I_BAD will take the level associated with the highest predicted probability P_BAD1 and P_BAD0.   It is reasonable to assign observations to the target level which is most likely but this presents problems in rare event scenarios.  

In your oversampled data, your target level of interest occurred 30% of the time overall.  Using my example, suppose that BAD=1 occurs 30% of the time in the sample.   To have P_BAD1 > P_BAD0, the observation had to have P_BAD1 > 50% which represents someone at least (50%) / (30%) = 1.666 times as likely to have the event compared to the overall average.   After adjusting for the prior probabilities to have the overall average only 2%, you would now need someone who was at least (50%) / (2%) = 25 times as likely to have the rare event as the predicted event.    Since there are far fewer people in this category, there are far fewer people (possibly none!) classified as having the rare event according to I_BAD (using my example).  

 

In these situations, you can consider using a target weight to put more weight on the rare event.   If you do add Decision weights (either in the Decisions node or in the Input Data Source node), SAS Enterprise Miner will also generate a D _<target variable name> which contains the 'decision' outcome based on the 'most profitable' or 'least costly' outcome.  In this situation, the decision weight is multiplied by the adjusted probability to get the 'expected value' of the decision and the outcome is assigned based on the best outcome.    

 

Assigning outcomes based on putting extra decision weight on rare events can also pose challenges since those outcomes will be predicted to occur more often than they actually do.   If you click on the button 'Default with Inverse Prior Weights', SAS Enterprise Miner will take the specified prior and divide it into 1 to obtain the weight.  Suppose the prior probabilities were specified as 20% and 80%.   Then using the 'Default with Inverse Prior Weights' button would yield weights of  1 / 0.2 = 5 for the rare event and 1 / 0.8 = 1.25 for the common event.  You will notice that the ratio of weights

 

   5 / 1.25 = 4

 

is in the same ratio as the prior probabilities

 

    80% / 20% = 4

 

so simply leaving the weight on the common event as 1 and changing the rare event to have a weight of 4 will have the same impact. Notice now that for the 'average' observation who has a probability of the rare event as 20% (or 0.2) and probability of the common event of 80% (or 0.8), you can see the expected value is the same using the weights as described above:

 

     Level            Prior      Weight       Expected Value

 rare event          0.2           4                0.2 * 4 = 0.8

common event    0.8           1                0.8 * 1 = 0.8

 

which suggests that using the 'Default with Inverse Prior Weights' will assign anyone with a probability higher than 0.2 (in this scenario) to have the target event which corresponds to anyone with a higher predicted probability than average.   This will generate a lot more predicted events based on the D_<variable name> variable since it is not unlikely that half or more of the observations have a predicted probability higher than average.  

 

So what do you do?  Understand that the overall misclassification rate of the data set is not what is critical.   Look at the rate in each percentile of the data and determine how deep you want to go.  Then you can choose your own Decision threshold (e.g. probability higher than 0.35) above which you get a satisfactory misclassification rate.  The approach taken by SAS Enterprise Miner is a reasonable one since it has no business knowledge to base the outcome on other than what is provided -- either pick the most likely outcome or the most valuable outcome based on your weights -- but your best decisions will always incorporate your analytical needs.

 

For example, in some cases you might need an extremely low misclassification rate (e.g. maybe only looking at the top 1% or 2% of the scored data) because you are searching for fraud and don't want to annoy customers that are not acting fraudulently.  In other cases, you might be looking for a minimum response rate to make money (e.g. some direct mail advertisers only need a 2% response rate to be profitable).  Your best 'decision' should always incorporate your analytical and/or business objectives.  

 

I hope this helps!

Doug

jeremyyuan
Calcite | Level 5

Thanks Doug!

That helps me a lot in understanding the logic behind Decision Trees module in EM.

Based on what you said, I notice I have a misunderstanding: I thought after I adjusted the prior probability to 0.02 vs 0.098, then Decision Trees module will process the oversampled data and finally assign the probability based on the actual 0.02 vs 0.98 automatically at the final stage and would NOT “be predicted to occur more often than they actually do”.  But actually in this way, it is still predicted to occur more often than they actually do.

 

I heard (but not saw ) that if using oversampled data, the adjustment should be made during scoring stage, not the Decision Trees modeling stage, by using some certain formula. Do you have any idea?

My purpose is to need low misclassification rate, I will use the actual data file to try and see which assessment measurement is fit.

Thanks a lot!

 

Jeremy

DougWielenga
SAS Employee

The way in which the priors are handled differs among modeling nodes and even within (optionally) modeling nodes.   For examples, you can optionally choose the options for Use Priors = Yes or Use Decisions = Yes in the Split Search section of the Decision Tree properties to request that these properties be used in determining the best split but these are both set to No by default.  Regardless of those settings, the Decision Tree will display adjusted counts in the Tree output found inside the node results so that the overall proportions in each node reflect the population values.  For Regression models, the adjustment is done as a posterior adjustment at the end of the score processing so you would get the same model parameters for the same regression model but the predicted probabilities might be different.  

 

It is not possible to give a universal answer for the best settings to use since each data set and each business/analytical need might be different. Having said that, one of the Assessment settings in the Decision Tree node properties that can be useful in rare event scenarios is Lift since you can also specify the Assessment Fraction which is the proportion of the data on which to evaluate the Decision Tree models.  Please note that Decision Tree models have terminal nodes where everyone in the same node receive the same score.  Suppose you wish to obtain the best lift in the top 5% of the data.  How can you determine the top 5% of the data if your best node contains 3% of the data and the second best node contains 4% of the data?   You are unlikely to have your best node contain exactly 5% of the data so the algorithm still needs to handle ties.  The Decision Tree node does this for you.  It seems reasonable to consider lift in the top X% where X is at least 3 times as big as the background rate for rare event scenarios.  If the data has at least a 15% response rate, this might not be as critical.    In practice, it is likely better to consider additional modeling strategies or ensemble models to help to differentiate the observations in each terminal node.  You can then look at finer gradations of performance (say in each 1% grouping of the ordered observations) to help determine where the best cutpoint is.  

 

I hope this helps!

Doug

PadraicGNeville
SAS Employee

 Hi, Jeremy.

 

I suspect that when the tree consists of only the root node, several nodes are created but they are being pruned away.  When the Tree thinks the proportions are 2% bad and 98% good, all such nodes might be classifying the observations as good. The misclassification method of assessment will prune all the nodes because no node will change the misclassification rate.  When the Tree thinks the proportions are 30% bad and 70% good, it is more likely that some nodes classify observations as bad, and the misclassification rate can improve if these nodes are included in the Tree.  With Adjust Priors, the Tree thinks there are 2% bad.

 

I would not specify Adjusted Priors in this case.

 

I do not know what the DECISION method of assessment is doing that is different from misclassification.

-Padraic

jeremyyuan
Calcite | Level 5
Thanks so much, Doug and Padraic for your very helpful explanations and opinions!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 9502 views
  • 0 likes
  • 3 in conversation