Solved: Re: Decision Tree under sampling/ over sampling using SAS Enterprise M...

kewldude · Posted 09-07-2017 01:35 AM

Hi. I'm a first time poster and I'm a newbie in the world of data mining. I have to be honest, I'm working on an assessment that requires me to produce different decision trees using SAS Enterprise Miner by coming up with different configuration. I'm also quite new in using Enteprise Miner and had just gone through the Intro to SAS Enterprise Miner pdf. My dataset (which is an imported csv) is composed of bank data (e.g. age, education, job, housing loan, personal loan, consumer index, precious outcome of marketing campaign etc) that intends to predict whether a client will sign up for a lon or not, so I have a signup class with yes and no as the expected result. I noticed in my whole dataset which is around 40k rows, most of the results are No, only around 10% are yes. Based on my readings, this is called an unbalanced data set and I won't get a good model out of this. I did some more readings and I read somewhere that I need to undersample the "No" results in order to oversome this over representation. I raised this to my professor and ask him how I can do it in Enterprise Miner, but he just ignored me, so I'm left to do it on my own. My question is how do i go about this using SAS Enterprise Miner 13.1?

DougWielenga · Posted 09-08-2017 10:52 AM

I'm a first time poster and I'm a newbie in the world of data mining.

You might want to check out an "old" paper of mine that some people have found helpful which can be found by searching on my last name and 073 at http://support.sas.com or by clicking on the link below:

http://www2.sas.com/proceedings/forum2007/073-2007.pdf

I'm working on an assessment that requires me to produce different decision trees using SAS Enterprise Miner by coming up with different configuration.

You might want to consider building parallel paths which use different Data Partition nodes to create trees from different partitions of the data. Trees are highly 'unstable' in the sense that you can get a very different looking tree for only slight changes in the data. Even changing the sort order of the observations might impact how the tree grows.

I noticed in my whole dataset which is around 40k rows, most of the results are No, only around 10% are yes. Based on my readings, this is called an unbalanced data set and I won't get a good model out of this. I did some more readings and I read somewhere that I need to undersample the "No" results in order to oversome this over representation. My question is how do i go about this using SAS Enterprise Miner 13.1?

You are dealing with a rare event and it is actually more commonly encountered than not in predictive modeling. In reality, 10% is not even all that rare. It is not necessary to have a balanced data set to have a 'good' model, although there are certain metrics (e.g. misclassification rate) that do not look 'good' when evaluating models fit to unbalanced data. A model assigning every observation to the most common event is highly 'accurate' even though the model is useless. In practice, I typically don't oversample unless the response rate is well below 5% for a binary event.

Having said that, you can oversample in SAS Enterprise Miner by using the Sample node. To achieve an oversampled proportion (say, 20%) in the final data set, proceed as follows:

1 - Change the Criterion property in the Stratified section of the Sampling node from Proportional (which is the default) to Level-Based

2 - Change the Level Selection property of the Level Based Options to be Rarest Level (assuming this is the level you want to have in 20% of the sampled observations).

3 - Set Level proportion to be the proportion of the rare events you want to sample (Specify 100 if you want all of the observations having the rare event).

4 - Set Sample proportion to be 20 (reflecting this scenario where you want the sampled observations will have this proportion of the rarest level))

5 - Make sure the Adjust Frequency section of the Oversampling section is set to No. Otherwise, a frequency variable will be used to reweight the observations making it look as if the sampled data has the same proportion of the target event as the original population.

Another approach is to use Decision weights to allow your rare event to be chosen as a predicted value.   You should be able to accomplish what you want by using Decision Processing without oversampling. To do so, proceed as follows:
1 - Click on the Input Data Source
2 - Click on the ... to the right of the Decisions property
3 - Click on Build to create a target profile
4 - Click on the Decisions tab
5 - Click on the button labeled Default with Inverse Prior Weights
     Note: this will make sure that you are able to find variables that are useful predictors
6 - Click on the Decision weights tab, and verify that you have values other than 1 on the diagonals.
7 - Click on OK.

To better understand what is happening, consider the following example:

If an event happens 1% of the time, a person who is 10 times as likely to have the event will only have a 10% change of having the event.  This means they have a 90% chance to have the nonevent.  Therefore, they will be predicted into the non-event class unless additional weight is put on a correct prediction of the event of interest.

To determine the amount of weight to put on the rare event, calculate the ratio of probability of the common event divided by the probability of the rare event.  Change the weight on the rare event to be the same as this ratio.  You can edit this in either a Decisions node or an Input Data Source node.

For example, if you have a binary event where Prob(Yes)=0.1 and Prob(No)=0.9, the ratio of the common event to the rare event is 0.9/0.1 = 9.  Change the weight on the rare event from the default 1 to 9 in the Decision Weights tab.  If your rare event is much more rare, say 2%, the calculation remains the same.  The weight should be 0.98/0.2=49.  If you have an event which happens far less than 1% of the time, you may get better results by oversampling and then adjusting the probabilities later.

Note: Enterprise Miner does this by applying inverse priors. In a case where Prob(Yes)=0.2 and Prob(No)=0.8, inverse priors would assign the values (1/0.2)=5 to Prob(Yes) and (1/0.8)=1.25 to Prob(No), but this is the same ratio obtained by the method I described since 5/1.25 =4/1.   I prefer my method since it makes everything relative the common event which always has weight=1.

It is not typically necessary to have the calculation be exact.  If in your example, you hypothesized a 5% target rate, the calculated weight would be 0.95/0.05 = 19.  Using a weight of 19 means that you will the model will choose variables that help identify people that are more likely than average to churn.  Suppose you wanted to have the model choose people that were twice as likely to churn.  In this case, you want people with a probability of 0.10 to be selected as churners which makes the calculation 0.9/0.1=9.  Therefore, increasing the decision weight on the rare event to be 9 will imply a decision rule where only people who are twice as likely to churn are predicted as churners.

Please note that you can always apply a different decision rule after the model is built.  The extra weight is needed to get the model to identify variables that can help separate people.  If the product of the probability and the weight is too unbalanced, you may end up with the null model again.

Hope this helps!

Doug

View solution in original post

DougWielenga · Posted 09-08-2017 10:52 AM