Hi Miguel, I really appreciate your advice. Below is the result of your suggestions: A) Gradient Boosting. (Data -> Gradient Boosting) Result: Model Misclassification rate ASE ROC Gini Coef. Y Boost Gradient Boosting 0.005701214 0.00566871 0.5 0 Model Model Description Data Role Target FALSE Negative TRUE Negative FALSE Positive TRUE Positive Node Boost Gradient Boosting TRAIN RESPONSE_IND 6445 1124016 0 0 This model classified all cases as non-response – just like other models that I built – because my response rate is (0.0057 or 6445 cases) and non-response is (0.9943 or 1124016) and True Positive and False Positive are both zero. B) Ensemble of Reg2 and Reg3 (using 10% oversampled data) Selected Model Model Node Valid: Average Profit for RESPONSE_IND Train: Average Squared Error Train: Misclassification Rate Valid: Average Squared Error Valid: Misclassification Rate Y Reg2 1.71 0.090012 0.10002 0.089988 0.099984 Reg3 1.71 0.090012 0.10002 0.089988 0.099984 Ensmbl 1.71 0.098908 0.10002 0.098877 0.099984 C) Bagging, Boosting, and Rotation Forest. This paper you suggested me is great!! I am still working on this part since I have not used these methods. I will let know. Findings from my search ( I am not sure how reliable they are) I was reading SAS support page and they said, “Over-weighting or under-sampling can improve predictive accuracy when there are three or more classes, including at least one rare class and two or more common classes.” My data has two classes, so I guess it is true for my case because my oversampling (omitting cases from common classes) is not helping to obtain the best model regardless of sample sizes. However, when I don't put adjusted prior (using the prior of sample data), my misclassifications increases as I increase sample size as well as the number of True Positive. I care more about the number of True Positive rather than errors. Is it wrong if I keep current prior probablity(not using adjusted prior)? Below is the link to the website. http://support.sas.com/documentation/cdl/en/emxndg/64759/HTML/default/viewer.htm#p1w6fewo0jhzxdn1rytuk1kt0pqj.htm The other thing is how to evaluate models when you need to detect the rare event. Detection rate (Recall) - ratio between the number of correctly detected rare events and the total number of rare events False alarm (false positive) rate – ratio between the number of data records from majority class that are misclassified as rare events and the total number of data records from majority class. ROC Curve is a trade-off between detection rate and false alarm rate so Missclassification, ASE, Average Profit , etc are not sufficient metric for evaluation when rare event?? Here is the title of article: "Data Mining for Analysis of Rare Events: A Case Study in Security, Financial and Medical Applications" Thanks, Miguel! I really appreciate your help.
... View more