BookmarkSubscribeRSS Feed
waleye
Calcite | Level 5

Provider Charge(Total) Fraud  %Fraud    Fraud*%Fraud
A           1000               600        0.6           360
B            100               70           0.7            49
C            10                 8             0.8            6.4
D            1                   1             1.0            1

Which provider should be given alert?
I am creating a model and setting up a threshold as above. Which provider should be given alert? If %Fraud is used, then provider D should be given alert. Obviously, it is unreasonable. Then, I want to use Fraud*%Fraud, the results are pretty reasonable.
Question: 1)Are there any statistic theory about Fraud*%Fraud? If so, what is the theory?
2)If Fraud*%Fraud is unreasonable. Are there any better suggestion?
Thanks in advance.

7 REPLIES 7
Reeza
Super User

Fraud*%Fraud = Fraud*Fraud/Amount

 

I might be wrong but its the same measurement of Fraud at the end of the day? It will flag the same records as fraud amount will. 

 

Having a hard picture that curve and I tried to graph it but it seems to be not that significant:

 

data have;
do amount=1 to 1000;
do fraud=0 to 800 by 10;
fraud2=fraud*fraud/amount;
output;
end;
end;
run;

proc sgplot data=have;
heatmapparm x=amount y=fraud colorresponse=fraud2;
run;quit;

I'm not sure what a better measurement might be though, it's worth checking the literature I suppose. This is a well known problem - using amount or percentage as detector.  

 

 

 

waleye
Calcite | Level 5

Hi  

Thank you for your response.

Yes, you are totally right. Most of the models use the amount or percentage as detector.

But I want to try used Fraud*%Fraud as  detector.

So do you think Fraud*%Fraud is reasonable? 

and do you have a literature to support this?

Thanks.

Reeza
Super User

No I don't think this is reasonable and no I don't have literature to back this up.

 

My rationale is why scale the number to something you don't already have, this adds complexity to a model. 

 

Without knowing your full model it's hard to make a guess. Have you looked into 2 stage models?

rayIII
SAS Employee

Can you say more about the model you are creating? Depending on what you are trying to accomplish, you may not need to compute percentages. For example if you are attempting to identify characteristics of fraudulent cases, then you can use fraud/no fraud as a binary target. 

waleye
Calcite | Level 5

Hi rayIII 

Thank you for your response.

For example. KX is the maximum payment has been reached.

Provider Total_bene     KX_Bene  %KX_Bene    KX_Bene*%KX_Bene
A            1000               600            0.6                360
B            100                 70              0.7                49
C            10                   8                0.8                6.4
D            1                     1                1.0                 1

Most of the model use %KX_Bene as detector. But I want to use "KX_Bene*%KX_Bene" as a detector.

Do you think it is reseanonable? if so, is there any statistical theories to support that?

Thanks.

ballardw
Super User

Please explain exactly what you think KX_Bene*%KX_Bene tells you that %KX_Bene does not.

waleye
Calcite | Level 5

hi  

This is a good question.

If %KX_Bene was used. Some small provider. Like Provider D. He has only 1 beneficiary, who is fraud bene. So, %KX_Bene is 100%, the highest. But, obviously, he is not the provider who we want to catch.

If KX_Bene*%KX_Bene  was used, then the number of fraud_bene was taken into account.  KX_Bene*%KX_Bene  is like a "Weight" or "Expectation".

Acctually, if KX_Bene*%KX_Bene  was used, the performance of the model become better. But, I cannot find the theory to support that.

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1120 views
  • 3 likes
  • 4 in conversation