BookmarkSubscribeRSS Feed
EC189QRW
Obsidian | Level 7

I’ve got 300 rules on my rule list. Plenty of those haven’t updated for a really long time. I’d like to select some of rules based on the index given by confusion matrix, eg. TPR (Ture positive rate), PV+ (Positive predicted value) from transaction table. Maybe 60 out 300 did 99% of job based on TPR. Then the 60 rules would become my new Pool of Rules. I will drop the rest 240 rules and refresh the rule list thereafter. My basic logic is which rule gives the highest TPR from transaction would come into the Pools at first. For example , R2 gives the highest TPR. Then it comes in at first. Then the rest of rules who comes to the pool gives the highest TPR with R2 would become part of Pools.

Because of there might be some overlap between each rules. So we need to calculate the TPR at each time. Make the best choice each round. The iteration would go on until the difference between TPR of Pool A and TPR of Pool B is like 0.01, I mean it would diverged at some point.

At present, I could create a table of TPR and PV+ for each rules from transaction table. But I don’t know how to dynamically create a sequence of rules list and abstract some of those which gives the most TPR increase out as variables. Hope there is someone who can help me and give me some clue how to tackle the problems. Thanks at first.

Here is transaction sample data. Seq stands for sequence,gb stands for GOOD/BAD ,r1-r5 stands for Rule1-Rule5.

 

 

data trx;
input seq gb r1 r2 r3 r4 r5;
cards;
1 1 0 0 0 1 0
2 0 0 0 1 0 1
3 1 1 0 0 0 1
4 0 0 0 0 0 1
5 0 1 0 0 1 0
6 0 0 1 0 0 0
7 1 0 1 0 0 1
8 0 0 0 0 0 0
9 0 0 0 1 0 0
10 1 1 0 0 0 0
11 0 0 0 0 1 0
12 1 1 1 1 1 0
13 1 0 1 1 0 0
14 0 1 0 0 1 1
15 1 1 0 0 0 1
16 1 0 0 0 1 0
17 0 0 0 0 0 1
18 1 0 1 0 1 0
19 0 0 0 0 0 1
20 0 0 1 1 1 0
21 0 1 1 0 1 1
22 0 0 1 0 0 0
23 1 1 0 1 1 1
24 0 0 1 0 0 0
25 1 0 1 1 0 1
26 0 0 0 0 1 0
27 0 0 0 1 1 0
28 0 0 0 0 0 1
29 0 0 0 0 0 0
30 1 0 1 1 1 1
31 0 1 0 0 0 0
32 1 0 1 0 1 1
33 0 1 0 0 0 0
34 1 0 0 1 0 1
35 0 1 0 0 1 0
36 0 0 0 0 1 0
37 0 0 0 0 0 1
38 0 1 0 0 1 0
39 1 1 1 1 0 0
40 0 1 1 0 1 0
;

2 REPLIES 2
ballardw
Super User

I've read this three times now. I have to say I haven't a clue of what you actually want.

How do you get TPR (Ture positive rate), PV+ (Positive predicted value) from that data? You also say "So we need to calculate the TPR at each time". What indicate "each time" in that data set?

What do you want the final dataset to look like?

How do you apply any of the "rules"?

EC189QRW
Obsidian | Level 7

Thank you for your reply. Sorry for my misleading. I forgot some important information.

The sample dataset is from credit card transactions. Some of those might be fraudulent, some of those might be normal. Basically, we implemented these rules to label highly suspicious transactions. GB =1 means an actual fraud transaction, GB=0 means a non-fraud transaction. R1 to R5 stands for different rules we used to label suspicious transactions. For instance, R1=1 means the transaction labeled as a fraudulent transaction. R1=0 means the transaction labeled as a real transaction. So confusion matrix could be used here to select effective rules. Our major concerns for these rules is TPR (Ture positive rate) and PV+ (Positive predicted value) ,TPR=true positive/total actual positive=d/c+d ,PV+=true positive/ total predicted positive=d/b+d. As our pool of rules is almost full so I’d like to select a sequence of effective rules out of pools and implemented in a system which might give a relief to our server.

 

 

Predicted:1

Predictied:0

 

actual:1

d, True Positive

c, False Negative

c+d, Actual Positive

actual:0

b, False Positive

a, True Negative

a+b, Actual Negative

 

b+d, Predicted Positive

a+c, Predicted Negative

 

 

 

I’d like to get a rule list like r2,r1,r4,r3 as follows.

 

Obs

rule

ruleselected

accuracy

errorate

Tpr

Pvplus

Tnr

PvMinus

1

RuleX

r2,r1,r4,r3

0.55

0.45

1

0.45455

0.28

1

 

The first round selection of rules is R2 because of its highest TPR in the rule list. Then R2 becomes part of rules of pool. The second round I need to calculate R2R1, R2R3,R2R4,R2R5 and try to select the highest TPR out of second round rule list and add the second rule to the rules of pool, for example R1. The process continue until there would be no increase in TPR for the pools. Then the iteration stops. I don’t know if I made point clear. If you have any questions, please leave a comment. Thank you for your time and really appreciate.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 675 views
  • 0 likes
  • 2 in conversation