02-22-2017 05:59 PM
I am quite new to scorecard. I am generating a scorecard model to predict fraud application using scorecard node in EM. The result obtained from one input variable as following:
Variable Name: Group scorecard point weight of evidence event rate (fraud)
delivery_date 1 1 24 -1.82 52.75%
0, missing 2 4 0.06 14.56%
2+ 3 32 -2.63 71.38%
My understanding of the scorecard point indicates how likely the appliciation become fraud. The low scorecard indicate risky application. However this is not the case showed by the above example,delievery_date= 2+ is receiving very high scorecard(mean low risk) at the fraud rate of 71.38% and delievery_date=0 or missing has very low scorecard point of 4 ( high risk application) however only having 14.56% of its application become fraud.
If we have an application with delivery_date=0 then it will be given a total score=4 (assume delivery_date WOE is only input variable).
Will that misclassify that application to high risk?
02-24-2017 12:36 PM
Typically when you have this -- a positive regression coefficient and thus the scorecard points going in the opposite direction than you would expect -- it is due to collinearity among inputs. So there might be another input highly correlated with delivery_date. Are you using model selection for the regression model? That could help eliminate similar inputs.