Hi Miguel, Thanks for your quick response. Yes, you guessed it right, I am working on a binary classification problem with imbalanced proportions of Y and N. Now, the issue of class imbalance is not solely responsible for poor classification performance of the model/s. This coupled with an overlap among the target classes (or rare event instances occuring in smaller disjuncts/islands) further complicates the rare event classification problem. SMOTE+Tomek Links is just one of the handful techniques aimed at achieving a "balanced" dataset, on which traditional classifiers work well. I would rephrase my original question as: If I create a balanced dataset with Y:N almost the same, then should I still use the adjusted priors (decision processing settings for input data in SAS EM), before running any model. Later on, I still would use my original dataset for scoring, just to check how many instances fall under TP, TN etc. I think this should be correct as I have seen few examples (PVK'97 Donor dataset or similar) where they have started with a balanced dataset, but then used the adjusted priors same as the original priors, before running the decision tree models etc. I shall go through the link shared by you in detail once again, as I see it much useful. Earlier, I tried using boosting and gradient boosting on a similar dataset with the following results: 1) Embed decision tree node between start and end group node, select boosting with 5 iterations and run the process flow: SAS EM gives good TP, but there are large number of FP also (needless to say interpreting these results as patterns was daunting task which I somehow managed). However, higher iterations of boosting lead to diminished performance as well. 2) Gradient Boosting: SAS EM Gradient Boosting didnt yield me any results (not really sure as to why?). I assume gradient boosting works for binary targets as well. Do share your thoughts/experiences on the same. Hope this information is useful to you. Regards, Aditya.
... View more