BookmarkSubscribeRSS Feed
majeedk
Calcite | Level 5

Hi

I am using Decision Tree model to solve a classification problem. My Target variable is highly imbalanced.i.e.  60212 (99%) Non-event and 633 (1%) event. I using sampling to balance the class of target variable as shown in the picture below. In addition to using cross-validation to tune the model I am also using different sampling ratios to check which ratio will work best for me.

 

Tree_Model.PNG

After sampling I am using inverse prior probabilities to adjust decision weights. This was suggested in this post on SAS forums.

My question here is that which "inverse prior probabilities" shall i use:

  1. the original data before sampling & partitioning or
  2. the data from output of sample node or
  3. the data from output of partition node (train data)

 

If I use the "Decision Processing" Node after partitioning it allows me to use "Inverse Prior Weights". 

 

decision.PNG

 

When I click this, the weights are automatically adjusted as shown the screenshot below.

Weights.PNG

 

Based on my calculations (below) these are the weights from the "Training Data" coming out of partitioning node. 

Is this correct? Shall I override these with Inv priors from my original un-sampled data shown in row 1 below (i.e. 1.01051 and 96.12164)

 

Calculations.PNG

 

 

1 REPLY 1
DougWielenga
SAS Employee

After sampling I am using inverse prior probabilities to adjust decision weights. This was suggested in this post on SAS forums.

 

My question here is that which "inverse prior probabilities" shall i use:

  1. the original data before sampling & partitioning or
  2. the data from output of sample node or
  3. the data from output of partition node (train data)

 

I'm not sure I understand the diagram you have built since you are using the data in many different ways.  Those models are likely not really comparable in many ways since they represent different proportions of the event and non-event. 

 

Nonetheless, the thing to keep in mind is that the Inverse Prior Weights in a Decisions node will use the original probabilities from the population even if you have sampled the data to create different proportions in the data passed to the Decisions node unless you have specified the priors.  

Consider the following scenarios:

    * If you want to adjust every branch of the flow back to the original population values, set up the Decision profile in the Input Data node rather than using multiple Decisions nodes.   You can open the dialog by clicking on the ... to the right of Decisions for the Input Data node.  Specify the priors in the Input Data node (even if they are the same as the values in the data) so that those probabilities will be retained through sampling (Note: Do not specify Adjust Frequency=Yes in the Sample node).  You can then specify Inverse Prior Weights in the Input Data node and these priors and decision weights will be carried through each branch of your flow -- the Decisions nodes are unnecessary.

     * If you want to specify different priors for each branch of the flow, you can use your current diagram and first specify the adjusted priors you want to use (Note: Make sure to click on the Yes radio button after "Do you want to enter new prior probabilities?" in the Decisions node).  Then when you click on Default with Inverse Prior Weights in the Decisions node, it will use the priors you specified rather than any previous priors that might have been used. 

 

Hope this helps!

Doug

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2774 views
  • 0 likes
  • 2 in conversation