Thank you for your reply. Sorry for my misleading. I forgot some important information. The sample dataset is from credit card transactions. Some of those might be fraudulent, some of those might be normal. Basically, we implemented these rules to label highly suspicious transactions. GB =1 means an actual fraud transaction, GB=0 means a non-fraud transaction. R1 to R5 stands for different rules we used to label suspicious transactions. For instance, R1=1 means the transaction labeled as a fraudulent transaction. R1=0 means the transaction labeled as a real transaction. So confusion matrix could be used here to select effective rules. Our major concerns for these rules is TPR (Ture positive rate) and PV+ (Positive predicted value) ,TPR=true positive/total actual positive=d/c+d ,PV+=true positive/ total predicted positive=d/b+d. As our pool of rules is almost full so I’d like to select a sequence of effective rules out of pools and implemented in a system which might give a relief to our server. Predicted:1 Predictied:0 actual:1 d, True Positive c, False Negative c+d, Actual Positive actual:0 b, False Positive a, True Negative a+b, Actual Negative b+d, Predicted Positive a+c, Predicted Negative I’d like to get a rule list like r2,r1,r4,r3 as follows. Obs rule ruleselected accuracy errorate Tpr Pvplus Tnr PvMinus 1 RuleX r2,r1,r4,r3 0.55 0.45 1 0.45455 0.28 1 The first round selection of rules is R2 because of its highest TPR in the rule list. Then R2 becomes part of rules of pool. The second round I need to calculate R2R1, R2R3,R2R4,R2R5 and try to select the highest TPR out of second round rule list and add the second rule to the rules of pool, for example R1. The process continue until there would be no increase in TPR for the pools. Then the iteration stops. I don’t know if I made point clear. If you have any questions, please leave a comment. Thank you for your time and really appreciate.
... View more