About ajosh

DougWielenga · ‎08-15-2017

There are several issues involved here that need to be separated in order to provide a clearer understanding. For categorical target variables, by default SAS Enterprise Miner assigns the observation to the most likely target level based on the predicted value stored in a variable of the form P_<target variable name><target variable level> Using the SAMPSIO.HMEQ data set (which is available by clicking on Help --> Generate Sample Data Sources... inside Enterprise Miner and adding the Home Equity data) as an example, there is a categorical target variable named BAD which has levels 1 and 0. SAS Enterprise Miner generates several variables from any modeling node and in this case it would create the variables P_BAD1 - the probability BAD=1 P_BAD0 - the probability BAD=0 These probabilities will reflect the training data by default so the probabilities of the rare event will be inflated if you oversample so that there is a higher proportion of observations with BAD=1 (the rare event) in the sample than there is in the population. If you are only concerned about the the predicted outcome, you can simply adjust the cutoff probabilities later using a Cutoff node to get the desired proportion of the data classified as events. Should you be more interested in the actual probabilities themselves (rather than just the ordering of the observations from most likely to least likely) and wish to have the probability scores reflect values closer to the original population rather than the training data, you can accomplish this by creating a Target profile in the Input Data Source node. A Target profile allows you to adjust the prior probability and the weight/value attached to correctly predicting each outcome. Adjusting the prior probability for an oversampled target will adjust the probability scores to be centered closer to the overall population average you provide. Depending on which criteria you are using for choosing the model, it might also be useful to apply additional weight/value to correctly predicting the rare event. By default, SAS Enterprise Miner defines two variables for a grouping variable target. F_ <variable name>: the target level for each observation I _ <variable name>: the predicted target level based on the fitted model (based on most likely outcome) When you request to use decision weights in your target profile, SAS Enterprise Miner will create a decision variable of the form D _ <target variable name> with the predicted outcome based on choosing the most profitable (or least costly) outcome from the product of the predicted probability and the decision weight for each level. I_<variable name> and D_<variable name> provide reasonable approaches in many situations but in rare event scenarios, the I_ variable will likely predict too few people as having the event and the D_ variable will predict too many. As a result, I generally advise people to take their business objectives into consideration in order to choose a cutoff for their particular data set. Without specifying decisions weights, you might end up with a tree with no branches if none of the leaves represent a higher probability for the rare event. As a result, it is often helpful to specify your priors and decision weights. It is easy to accomplish this task by following the instructions in Usage Note 47965: Using priors and decision weights in SAS® Enterprise Miner(tm) which is available at http://support.sas.com/kb/47/965.html where it shows the following: /*** BEGIN USAGE NOTE 47965 EXCERPT ***/ Data mining problems routinely involve situations where one target level is more "rare" than others. By default, SAS Enterprise Miner assigns the most likely outcome as the predicted outcome. This assignment results in decision rules that strongly favor the common outcome, which is usually not of interest. The assignment often generates models with no predicted events of interest. If you specify priors, then the posterior probabilities are adjusted, but the adjustment might lead to no variables selected. Even if a model is successfully fit, the predicted outcome might be the common target level. Example: an event occurs 1% of the time. A person who is 10 times as likely to have the event, still has only a 10% chance of having the event. You can change this prediction outcome by modifying the default decision weights. Edit the default decision weights either in a Decisions node, or in an Input Data node. To edit the default decision weights in the Input Data node, follow these steps: Click the Input Data node. Click the ellipsis (...) to the right of the Decisions property. Click Build to create a target profile. Click the Decisions tab. Click Default with Inverse Prior Weights. This selection enables you to find variables that are useful predictors. Click Decision Weights to see that the values changed from their default values. OK. To determine the amount of weight to assign to the rare event in a binary target, calculate this ratio: probability of the common event ratio = --------------------------------- probability of the rare event Specify the weight on the rare event to be equal to this ratio. For example, if you have a binary event where Prob(Yes)=0.1 and Prob(No)=0.9, then the ratio of the common event to the rare event is 0.9/0.1 = 9. Change the weight for Yes from the default of 1 to the value 9 in the Decision Weights tab. If your rare event is much more rare, for example 2%, then the ratio is 0.98/0.2 = 49. If you have an event that occurs much less than 1% of the time, then you might get better results by over-sampling, and then adjusting the probabilities later. Even if you over-sample, the priors adjust the probabilities, but the predicted outcome is the common event (if you do not modify the decision weights). The choice of the predicted-probability value to choose as the cutoff for predicting an event or non-event relies on business expertise. In the case of a rare event, it is common to focus only on the predictions in the small range of values for which action is taken. A model that always predicts that the event is the common outcome gives the outcome as often as the common event occurs in the data (example: 95% of the time). SAS Enterprise Miner provides an automated choice that is based on the decision weights that you provide. If these weights do not represent how you expect to implement the results, then focus on the ordering of the probabilities, choose your own threshold for action. For more information, see the chapter "Predictive Modeling" in SAS Enterprise Miner Help. Note: you might be able to apply this technique to a target variable that contains more than two levels. In that case, you need to specify how you want the levels to be weighted with respect to each other. /*** END USAGE NOTE 47965 EXCERPT ***/ You might also consider reviewing the paper Identifying and Overcoming Common Data Mining Mistake which is available at http://www2.sas.com/proceedings/forum2007/073-2007.pdf where it has a discussion of handling target variable event levels occurring in different proportions on the bottom of page 6. I hope this helps! Doug

yeliu · ‎08-10-2017

Hi Aditya， I would suggest you filter the rules you get from a low support level by using a metric called "interest" defined as below. According to probability theory, X and Y are independent if P(X∪Y)=P(X)P(Y). So the rule X⇒Y is not interesting if supp(X∪Y)≈supp(X)∗supp(Y), which means that a rule is not interesting if its antecedent and consequent are approximately independent. Wu et al. introduces the function interest(X,Y)=|supp(X∪Y)−supp(X)supp(Y)|. If interest(X,Y)≥min_interest, where min_interest is a predefined threshold, then itemset X∪Y is referred to as a potentially interesting itemset. Hope it helps, Ye

DougWielenga · ‎07-10-2017

ajosh, Modeling rare events (which is actually quite common) is often challenging for several reasons: * The null model is highly accurate (2% response rate means any model assigning all to the nonevent is 98% accurate) * Failing to put any additional weight on correctly predicting the rare event can lead to a null model (for the reasons above) * Increasing the weight on correctly predicting the rare event results in picking far more observations having the event than actually do It might be helpful to separate the tasks of modeling an outcome and taking action on the outcome. When modeling a rare event, you must often either oversample the rare event, add weight to correctly predicting the rare event, choose a model selection criteria that is not based on the classification, or some combination of these. For reason stated above, misclassification is typically not a good selection criteria for modeling. SAS Enterprise Miner always provides a classification based on which outcome is most likely. When a target profile is created and decision weights are employed, SAS Enterprise Miner will also create variables containing the most profitable outcome based on the target profile you created. The meaningfulness of that prediction is directly related to the applicability of the target profile weights. In general, modeling itself is more clear cut in that each analyst can pick and choose their criteria for building the 'best' model and then build the model. The resulting probabilities can then be used to order the resulting observations. Unfortunately for decision tree models, all of the observations in a single node are given the same score which is why some people run additional models within each terminal node to further separate the observations. The choice of what to do with the ordered observations typically involves business decisioning. The choice to investigate fraud can be costly, particularly if the person investigated is an honest loyal customer who just had an unusual situation. The amount of money at stake, the customer's longevity/profitability with the business, and the future expected value of the customer are just a few things that might be considered. This business decisioning usually creates far more complex criteria than can be simplified to a misclassification matrix which does not take the amount of money at risk into account. Simply put, whether you take the default decision based on the most likely outcome (typically inappropriate in a rare event), use the decision-weighted predicted outcome (assuming the decision profile accurately represents the business decisioning), or use some other strategy for selecting cases to investigate (based on available resources, amount at risk, likelihood of fraud, etc...), the TP and FP come from the strategy you employ. I clearly advocate business decisioning in determining how to proceed because the simple classification rate itself is not meaningful enough in rare events. Even looking at the expected value of money at risk (e.g. the product of the probability of fraud and the amount at risk) will yield a different ordering of observations. So there isn't a great answer to the question which cutoff to use without fully understanding the business objectives and priorities. I tend to use some oversampling (but not to 50/50 because it under-represents the non-event) and decision weights with priors to allow variable selection and to get reasonable probabilities but then combine those probabilities with other information to determine the final prioritization/action for observations based on some more complex rules.

M_Maldonado · ‎05-15-2015

Aditya, Take a look to this article from a professor from UT Dallas. Definitely worth reading. http://www.utdallas.edu/~nkumar/FactorExample.PDF

ajosh · ‎06-19-2014

Hi Miguel, Please find attached the precision recall curve and cut off graphical outputs. Do let me know your views on the same. Thanks, Aditya.

M_Maldonado · ‎05-30-2014

Hi Aditya, Correct, I sent you a code for the from/into classification matrix. If you want the decision variable use d_%EM_TARGET. The macro em_target resolves to the name of your target. The actual target is built by the proc your modeling EM node uses. Sometimes it is a dummy variable. Use the variable name in a code similar to the proc tabulate on this thread and you should be good to go. Good luck! Miguel

AnnaBrown · ‎05-15-2014

Hi Aditya, The tip Miguel posted on the cutoff node might be useful to you: Tip: Use the Cutoff Node in SAS® Enterprise Miner™ to Consume the Posterior Probabilities of Your Models Efficiently Anna

JThompson · ‎03-27-2014

Aditya, Sounds like you may be in the credit scoring line of work, but that is a guess. Binning, such as with Weight of Evidence, is popular in this area. I do not have much first hand experience in credit modeling. But as you have heard, WOE can be a great thing to do. I have often seen using binned versions of variables as inputs in models. Your last statement is why, in my opinion, prediction is easier than inference. When building a predictive model we have the benefit of a hold-out data set used for honest assessment. There is not one one-size-fits-all recipe for building a model. The best appraosh is to try out a few ideas, assess on hold out data and see which works best. This is the way to address the order in which actions could be done when modeling. Jeff

WayneThompson · ‎01-09-2014

Bagging (Breiman 1996) is a common ensemble algorithm, in which you do the following: Develop separate models on k random samples of the data of about the same size. 2. Fit a classification or regression tree to each sample. I tend to bag only trees but the start and end group nodes. allow other algorithms. 3. Average or vote to derive the final predictions or classifications. Boosting (Freund and Schapire, 1996) ,also supported through start > tree > end , goes one step further and weights observations that are misclassified in the previous models more heavily for inclusion into subsequent samples. The successive samples are adjusted to accommodate previously computed inaccuracies. Gradient boosting (Friedman 2001) resamples the training data several times to generate results that form a weighted average of the resampled data set. Each tree in the series is fit to the residual of the prediction from the earlier trees in the series. The residual is defined in terms of the derivative of a loss function. For squared error loss and an interval target, the residual is simply the target value minus the predicted value. Because each successive sample is weighted according to the classification accuracy of previous models, this approach is sometimes called stochastic gradient boosting. Random forests is my favorite data mining algorithm especially, when I have little subject knowledge of the application. You grow many large decision trees at random and vote over all trees in the forest. The algorithm works as follows: You develop random samples of the data and grow k decision trees. The size of k is large, usually greater than or equal to 100. A typical sample size is about two-thirds of the training data. At each split point for each tree you evaluate a random subset of candidate inputs (predictors) are evaluated. You hold the size of the subset constant across all trees. You grow each tree is as large as possible without pruning. In a random forest this case you are perturbing not only the data but also the variables that are used to construct each tree. The error rate is measured on the remaining holdout data not used for training. This remaining one-third of the data is called the out-of-bag sample. Variable importance can also be inferred based on how often an input was used in the construction of the trees.

jwexler · ‎01-03-2014

And...just as I wrote that I received more info from R&D: 1. You need to use the “Decision node”(not Decision Tree Node) after sampling node, you can specify the adjusted prior as your original prior (before sampling) and you will keep your data prior from the oversampling. If you don't use the decision node and you specify your adjusted prior as the original prior in input data source node, there will not be any predicted probability adjustment by the prior because the ratio is always 1. Decision matrix is related to calculating profit and loss, it will be applied separately after the prior adjustment. 2. For rare event modeling, usually an oversampling is required, it is not necessary to make the sample balanced. However it depends on your data and analysis. 3. take a look at the Proc Arbor procedure document, it has the details. The proc option “DECSEARCH” is for "Use Decisions" The proc option “PRIORSSEARCH” is for "Use Priors" in "Split Search 4. The Cutoff node will not impact on the decision tree node itself. The cutoff node will create just the EM_CUTOFF variable, which is the classification variable resulted by the new cutoff value. For exmple, your new cutoff is 0.06, a piece of cutoff score code will be added to the end of the previous score code. IF P_good_badgood > 0.06 THEN EM_CUTOFF = 1; ELSE EM_CUTOFF = 0;

anna_holland · ‎12-06-2013

Thank you, JasonXin, for all your help! Ajosh, I think you're all set. If you could, please mark this thread as answered correctly. If another user runs into the same issue, they can use this thread as a reference. Regards, -Anna-Marie

Online Status	Offline
Date Last Visited	‎09-01-2015 07:12 AM

Questions on exploratory factor analysis..

Identification of infrequent aka suspicious association rules.

Re: Is cut off node should still be used when boosting/ensemble models...

Re: Is cut off node should still be used when boosting/ensemble models...

Re: Is cut off node should still be used when boosting/ensemble models...

Is cut off node should still be used when boosting/ensemble models are...

Re: interpretation of from, into and decision columns in exported data...

interpretation of from, into and decision columns in exported data set...

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Some questions on data manipulation before using model node in SAS...

Re: Use of oversampling and cut-off node results interpretation in SAS...

Re: Identification of infrequent aka suspicious association rules.

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Questions on exploratory factor analysis..

Re: Is cut off node should still be used when boosting/ensemble models...

Re: interpretation of from, into and decision columns in exported data...

Re: Regarding use of original prior probabilities in class imbalance p...

Re: Some questions on data manipulation before using model node in SAS...

Re: Difference between boosting through start groups and gradient boos...

Re: Use of cut off node in SAS EMiner 7.1

Re: Deriving patterns (if then rules) from boosted trees in SAS Enterp...